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Abstract 


The  distributed  nature  of  information  in  a  distributed  system  is  one  of  the  major 
issues  that  protocols  for  cooperation  and  coordination  between  individual  compo¬ 
nents  in  a  such  systems  must  handle.  Individual  sites  customarily  have  only  partial 
knowledge  about  the  general  state  of  the  system.  Moreover,  different  information 
is  available  at  different  sites  of  the  system.  Consequently,  a  central  role  of  commu¬ 
nication  in  such  protocols  is  to  inform  particular  sites  about  events  that  take  place 
at  other  sites,  and  to  transform  the  system’s  state  of  knowledge  in  a  way  that  will 
guarantee  the  successful  achievement  of  the  goals  of  the  protocol.  ^ 

This  thesis  takes  a  few  initial  steps  towards  the  study  of  the  role  of  knowledge 
in  distributed  systems.  We  present  a  general  framework  for  defining  knowledge  in  a 
distributed  system,  and  identify  a  variety  of  states  of  knowledge  of  sets  of  processors, 
which  seem  to  capture  some  basic  aspects  of  coordinated  actions  in  a  distributed 
environment.  This  machinery  is  applied  to  the  analysis  of  a  number  of  problems:  we 
generalize  and  extend  the  well-known  coordinated  attack  problem,  which  deals  with 
the  effects  of  unreliable  communication  on  coordination  in  a  distributed  system;  we 
analyze  a  generalized  version  of  the  cheating  wives  puzzle,  obtaining  insight  into 
the  subtle  differences  between  broadcasting  messages  via  different  communication 
channels,  and  into  the  the  subtle  interaction  between  knowledge,  communication 
and  action.  Finally,  we  apply  this  machinery  to  the  study  of  fault-tolerance  in 
systems  of  unreliable  processors,  providing  considerable  insight  into  the  Byzantine 
agreement  problem,  and  obtaining  improved  protocols  for  Byzantine  agreement  and 
many  related  problems.  /  '<^P  J  ^ _ 
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Introduction 


Distributed  systems  of  computers  are  rapidly  gaining  popularity  in  a  wide  variety 
of  applications.  The  cooperation  and  coordination  between  computers  at  distinct 
locations  made  possible  in  a  distributed  system  greatly  enhances  the  usefulness  of 
the  individual  computers  in  the  system.  Such  cooperation  and  coordination  is  car¬ 
ried  out  by  the  execution  of  distributed  protocols  (sometimes  also  called  distributed 
programs  or  plans )  at  the  different  sites  of  the  system.  However,  the  distributed 
nature  of  control  and  information  in  such  systems  makes  the  design  and  analysis 
of  distributed  protocols  a  complex  task.  Unfortunately,  at  the  current  time  the 
design  of  distributed  protocols  is  more  an  art  than  a  science.  Basic  foundations, 
general  techniques,  and  a  clear  methodology  are  urgently  needed  to  improve  our 
understanding  and  ability  to  deal  effectively  with  distributed  systems. 

Whereas  the  tasks  that  distributed  protocols  are  required  to  perform  are  nor¬ 
mally  stated  in  terms  of  the  overall  behavior  of  the  system,  the  actions  of  an  indi¬ 
vidual  processor  in  a  distributed  system  can  only  depend  on  its  local  information. 
This  local  information  varies  from  site  to  site,  and  generally  provides  only  a  partial 
view  of  the  state  of  the  system.  In  determining  the  interaction  between  the  indi¬ 
vidual  processors,  a  distributed  protocol  must  ensure  that  the  states  of  knowledge 
attained  by  the  system  during  an  excution  of  the  protocol  allow  the  achievement  of 
the  protocol’s  goals.  Thus,  reasoning  about  the  system’s  state  of  knowledge  seems 
to  be  an  important  part  of  the  design  of  distributed  protocols.  Indeed,  designers  of 
such  protocols  frequently  find  it  useful  to  reason  intuitively  about  processors’  states 
of  knowledge  at  various  points  in  the  execution  of  a  protocol.  However,  formal 
descriptions  of  distributed  protocols,  as  well  as  actual  proofs  of  their  correctness 
or  impossibility,  have  traditionally  avoided  any  explicit  treatment  of  knowledge. 
Rather,  the  intuitive  arguments  about  the  state  of  knowledge  of  components  of  the 
system  are  customarily  buried  in  combinatorial  proofs  that  are  often  unintuitive 
and  hard  to  follow.  Consequently,  essentially  the  same  proof  is  often  repeated  with 
slight  variations  for  closely  related  models  of  distributed  systems. 

This  thesis  attempts  to  take  a  few  initial  steps  towards  making  reasoning  about 
knowledge  in  a  distributed  environment  more  explicit,  and  towards  understanding 
the  relationship  between  knowledge,  communication,  and  action  in  a  distributed 
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system.  Explicitly  reasoning  about  the  states  of  knowledge  of  processors  in  a  dis¬ 
tributed  system,  it  is  hoped,  will  provide  a.  more  general  and  uniform  setting  that 
will  offer  genuine  insight  into  the  basic  structure  and  limitations  of  protocols  in 
such  systems.  Clear  semantics  for  knowledge  in  a  distributed  system  should  reveal 
subtleties  that  may  not  be  otherwise  apparent,  sharpen  our  understanding  of  ba¬ 
sic  issues,  and  improve  even  the  high  level  intuitive  reasoning  about  knowledge  so 
intimately  involved  in  the  design  of  distributed  protocols  and  plans.  In  the  long 
run,  we  hope  that  a  theory  of  knowledge,  communication,  and  action  will  prove 
rich  enough  to  provide  general  foundations  for  a  unified  theoretical  treatment  of 
distributed  systems. 

The  general  concept  of  knowledge  has  received  considerable  attention  in  a  variety 
of  fields,  ranging  from  Philosophy  [Hi]  and  Artificial  Intelligence  [Me],  to  Game 
Theory  [A]  and  Psychology  [C1M].  A  study  of  knowledge  in  distributed  systems  can 
greatly  benefit  from  work  done  in  those  fields  and  the  paradigms  presented  by  them. 
Furthermore,  given  that  it  is  somewhat  easier  to  formally  model  the  “knowers”  and 
their  knowledge  in  a  distributed  system,  work  on  knowledge  in  distributed  systems 
promises  to  shed  light  on  aspects  of  knowledge  that  are  relevant  to  related  fields. 

In  the  next  section  we  look  at  the  “muddy  children”  puzzle,  which  illustrates 
some  of  the  subtleties  involved  in  reasoning  about  knowledge  in  the  presence  of 
many  “knowers”  or  “agents”.  In  Section  1.2  we  introduce  a  hierarchy  of  states 
of  knowledge  that  a  group  may  be  in.  Section  1.3  focuses  on  the  relationship  be¬ 
tween  knowledge  and  communication  by  looking  at  the  coordinated  attack  problem. 
Section  1.4  contains  an  outline  of  the  thesis. 

1.1  The  “muddy  children”  puzzle 

A  crucial  aspect  of  distributed  protocols  is  the  fact  that  a  number  of  different 
processors  cooperate  in  order  to  achieve  a  particular  goal.  Thus,  since  more  than 
one  individual  is  present,  an  individual  may  have  knowledge  about  other  individu¬ 
als’  knowledge  in  addition  to  his  knowledge  about  the  physical  world.  This  often 
requires  care  in  distinguishing  subtle  differences  between  seemingly  similar  states 
of  knowledge.  A  classical  example  of  this  phenomenon  is  the  “muddy  children” 
puzzle  —  a  variant  of  the  well  known  “wise  men”  or  “cheating  wives”  puzzles.  The 
version  given  here  is  taken  from  [B]: 


Imagine  n  children  playing  together.  The  mother  of  these  children  has  told 
them  that  if  they  get  dirty  there  will  be  severe  consequences.  So,  of  course,  each 
child  wants  to  keep  clean,  but  each  would  love  to  see  the  others  get  dirty.  Now 
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it  happens  during  their  play  that  some  of  the  children,  say  k  of  them,  get  mud 
on  their  foreheads.  Each  can  see  the  mud  on  others  but  not  on  his  own  forehead. 

So,  of  course,  no  one  says  a  thing.  Along  comes  the  father,  who  says,  “At  least 
one  of  you  has  mud  on  your  head,”  thus  expressing  a  fact  known  to  each  of  them 
before  he  spoke  (if  k  >  1).  The  father  then  asks  the  following  question,  over  and 
over:  “Can  any  of  you  prove  you  have  mud  on  your  head?”  Assuming  that  all  the 
children  are  perceptive,  intelligent,  truthful,  and  that  they  answer  simultaneously, 
what  will  happen? 

There  is  a  “proof”  that  the  first  Jb  —  1  times  he  asks  the  question,  they  will  all 
say  “no”  but  then  the  /bth  time  the  dirty  children  will  answer  “yes.” 

The  “proof”  is  by  induction  on  k.  For  k  =  1  the  result  is  obvious:  the  dirty 
child  sees  that  no  one  else  is  muddy,  so  he  must  be  the  muddy  one.  Let  us  do 
Jb  =  2.  So  there  are  just  two  dirty  children,  a  and  b.  Each  answers  “no”  the  first 
time,  because  of  the  mud  on  the  other.  But,  when  b  says  “no,”  a  realizes  that  he 
must  be  muddy,  for  otherwise  b  would  have  known  the  mud  was  on  his  head  and 
answered  “yes”  the  first  time.  Thus  a  answers  “yes”  the  second  time.  But  b  goes 
through  the  same  reasoning.  Now  suppose  Jb  =  3;  so  there  are  three  dirty  children, 
a,6,c.  Child  a  argues  as  follows.  Assume  I  don’t  have  mud  on  my  head.  Then,  by 
the  Jb  =  2  case,  both  6  and  c  will  answer  “yes”  the  second  time.  When  they  don’t, 
he  realizes  that  the  assumption  was  false,  that  he  is  muddy,  and  so  will  answer 
“yes”  on  the  third  question.  Similarly  for  6  and  c. 

Let  U8  denote  the  fact  “At  least  one  child  has  a  muddy  forehead”  by  m.  Notice 
that  if  k  >  1,  i.e.,  more  than  one  child  has  a  muddy  forehead,  then  every  child  can 
see  at  least  one  muddy  forehead,  and  the  children  initially  all  know  m.  Thus,  it 
would  seem,  the  father  does  not  need  to  tell  the  children  that  m  holds  when  k  >  1. 
But  this  is  false!  In  fact,  had  the  father  not  announced  m,  the  muddy  children 
would  never  have  been  able  to  conclude  that  their  foreheads  are  muddy.  We  now 
sketch  a  proof  of  this  fact. 

First  of  all,  given  that  the  children  me  intelligent  and  truthful,  a  child  with 
a  clean  forehead  will  never  answer  “yes”  to  any  of  the  father’s  questions.  Thus, 
if  k  =  0,  all  of  the  children  answer  all  of  the  father’s  questions  “no”.  Assume 
inductively  that  if  there  are  exactly  k  muddy  children  and  the  father  does  not 
announce  m,  then  all  children  answer  “no”  to  all  of  the  father’s  questions.  Note 
that,  in  particular,  when  there  are  exactly  k  muddy  foreheads,  a  child  with  a  clean 
forehead  initially  sees  k  muddy  foreheads  and  hears  all  of  the  father’s  questions 
answered  “no”.  Now  assume  that  there  are  exactly  k  + 1  muddy  children.  We  prove 
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by  induction  on  the  number  of  questions  asked  that  all  of  the  children  answer  “no” 
to  all  of  the  father’s  questions.  Assume  inductively  that  all  of  the  children  have 
answered  “no”  the  father’s  first  n  questions  (for  n  =  0  this  condition  is  vacuously 
true).  Recall  that  a  clean  child  will  necessarily  answer  “no”  to  the  father’s  n  +  1st 
question.  Next  observe  that  before  answering  the  father’s  n  +  1st  question,  a  muddy 
child  has  exactly  the  same  information  as  a  clean  child  has  at  the  corresponding 
point  in  the  case  of  k  muddy  foreheads.  It  follows  that  the  muddy  children  must 
all  answer  “no”  to  the  father’s  n  +  1st  question,  and  we  are  done.  (Note  that  a  very 
similar  proof  shows  that  if  there  are  k  muddy  children  and  the  father  does  announce 
m,  his  first  Ar  —  1  questions  are  answered  “no”.) 

So,  by  announcing  something  that  the  children  all  know,  the  father  somehow 
manages  to  give  the  children  useful  information!  How  can  this  be?  Exactly  what 
was  the  role  of  the  father’s  statement?  In  order  to  answer  this  question,  we  need 
to  take  a  closer  look  at  knowledge  in  the  presence  of  more  than  one  knower;  this  is 
the  subject  of  the  next  section. 

1.2  A  hierarchy  of  states  of  knowledge 

Although  in  different  contexts  knowledge  may  be  assumed  to  mean  different 
things,  one  property  generally  required  of  knowledge  is  that  only  true  things  be 
known,  or,  more  formally,  that  knowledge  satisfy  the  axiom 

Ki<p  D  p\ 

i.e.,  if  an  individual  t  knows  p,  then  p  is  true.1  In  Chapter  2  we  will  discuss  specific 
interpretations  for  knowledge  that  seem  to  be  particularly  useful  in  the  context  of 
distributed  systems.  For  the  purposes  of  our  discussion  in  the  next  few  sections, 
the  only  properties  we  require  of  an  individual’s  knowledge  is  that  it  be  a  function 
of  the  individual’s  view  of  the  past  and  that  it  satisfy  the  above  axiom  (we  make 
this  precise  in  Section  2.2). 

Given  a  reasonable  interpretation  for  individuals’  knowledge,  how  does  the  no¬ 
tion  of  knowledge  generalize  from  an  individual  to  a  group?  In  other  words,  what 
does  it  mean  to  say  that  a  group  G  of  individuals  knows  a  fact  p?  More  than  one 
possibility  is  reasonable,  with  the  appropriate  choice  depending  on  the  application: 

•  lap  (read  “the  group  G  has  Implicit  Knowledge  of  p"):  We  say  that  G  has 

implicit  knowledge  of  p  iff  someone  who  would  have  complete  knowledge  of  what 

1  Notions  that  do  not  satisfy  the  Kip  D  p  axiom  are  customarily  thought  of  as  belief. 
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each  member  of  G  knows  would  know  p.  Thus,  roughly  speaking,  p  is  implicit 
knowledge  in  G  if  the  knowledge  about  p  is  distributed  among  the  members  of 
G.  For  instance,  if  one  member  of  G  knows  ip  and  another  knows  that  ip  D  p, 
the  group  G  can  be  said  to  have  implicit  knowledge  of  p. 

•  Sap  (read  “someone  in  G  knows  99”):  We  say  that  SGp  holds  iff  some  member 
of  G  knows  p.  More  formally, 

Sa<p  =  \J  Kpp. 

itG 

•  Eap  (read  “everyone  in  G  knows  9?”):  We  say  that  Eap  holds  iff  all  members 
of  G  know  <p.  More  formally, 

Ea(p  =  Kup. 

itG 

•  Ekp,  k>2  (read  “ <p  is  £* -knowledge  in  G ”):  E*p  is  defined  by 

Eq<p  =  Eap, 

£*+V  =  EaEkp,  for  k  >  1. 

p  is  said  to  be  £fc-knowledge  in  G  if  “everyone  in  G  knows  that  everyone  in  G 
knows  that  . . .  that  everyone  in  G  knows  that  <p  is  true”  holds,  where  the  phrase 
“everyone  in  G  knows  that”  appears  in  the  the  sentence  k  times.  Equivalently, 

f\ 

ijtG,  l<j<k 

•  CG<P  (read  “99  is  Common  Knowledge  in  G”)*  Roughly  speaking,  p  is  said  to 
be  common  knowledge  in  G  if  p  is  true,  and  is  £* -knowledge  for  all  k  >  1.  In 
other  words, 

Cap  =  p  A  Eap  a  e"^p  a  •  •  •  a  E™p  A  •  •  • 

In  particular,  Cap  implies  all  formulas  of  the  form  Kit  •  •  •  Kinp,  where  the 
ij  are  all  members  of  G,  for  any  finite  n,  and  is  equivalent  to  the  (infinite) 
conjunction  of  all  such  formulas. 

(The  subscript  G  will  be  omitted  when  the  group  G  is  understood  from  context.) 
Clearly,  the  notions  of  group  knowledge  introduced  above  form  a  hierarchy,  with 
Cp  D  •••  D  Ek+1p  D  ■■■  D  Ep  D  Sp  D  Ip  D  p. 
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However,  depending  on  the  circumstances,  these  notions  might  not  be  distinct. 
For  example,  consider  a  model  of  parallel  computation  in  which  a  collection  of  n 
processors  share  a  common  memory.  If  their  knowledge  is  stored  in  memory  then 
we  arrive  at  a  situation  in  which  Cp  =  Ekp  =  Ep  =  Sp  =  Ip.  By  way  of 
contrast,  in  a  distributed  system  in  which  n  processors  are  connected  via  some 
communication  network  and  each  one  of  them  has  its  own  memory,  it  is  clear  that 
the  above  hierarchy  is  strict.  Moreover,  in  such  a  system,  every  two  levels  in  the 
hierarchy  can  be  separated  by  an  actual  task,  in  the  sense  that  there  will  be  an 
action  for  which  one  level  in  the  hierarchy  will  suffice,  but  no  lower  level  will.  It 
is  quite  clear  that  this  is  the  case  with  Ep  D  S<p  D  Ip,  and,  as  we  are  about  to 
show,  the  “muddy  children”  puzzle  is  an  example  of  a  situation  in  which  Ek+1p 
suffices  to  perform  a  required  action,  but  Ekp  does  not.  In  the  next  section  we 
will  present  the  coordinated  attack  problem,  in  which  Cp  will  suffice  to  perform  a 
required  action,  but  for  no  k  will  Ekp  suffice. 

Returning  to  the  muddy  children  puzzle,  let  us  observe  the  state  of  the  children’s 
knowledge  of  m:  “At  least  one  forehead  is  muddy”.  Before  the  father  speaks, 
Ek~  xm  holds,  and  Ekm  doesn’t.  To  see  this,  consider  the  case  k  —  2  and  suppose 
that  Alice  and  Bob  are  the  only  muddy  children.  Clearly  everyone  sees  at  least  one 
one  muddy  child,  so  Em  holds.  But  the  only  muddy  child  that  Alice  sees  is  Bob, 
and,  not  knowing  whether  she  is  muddy,  Alice  considers  it  possible  that  Bob  is  the 
only  muddy  child.  Alice  therefore  considers  it  possible  that  Bob  sees  no  muddy 
child.  Thus,  although  both  Alice  and  Bob  know  m  (i.e.,  £m  holds),  Alice  does 
not  know  that  Bob  knows  m,  and  hence  E2  m  does  not  hold.  A  similar  argument 
works  for  the  general  case.  We  leave  it  to  the  reader  to  check  that  when  there  are 
k  muddy  children,  Ekm  suffices  to  ensure  that  the  muddy  children  will  be  able  to 
prove  their  dirtiness,  whereas  Ek~l  m  does  not.  (A  more  detailed  analysis  of  this 
argument,  as  well  as  a  more  general  treatment  of  variants  of  the  muddy  children 
puzzle  more  closely  related  to  distributed  systems,  appears  in  Chapter  5.) 

Thus,  the  role  of  the  father’s  statement  was  to  improve  the  children’s’  state  of 
knowledge  of  m  from  Fk_1  m  to  Ek m.  In  fact,  the  children  have  common  knowledge 
of  m  after  the  father  announces  that  m  holds.  Roughly  speaking,  the  father’s  public 
announcement  of  m  to  the  children  as  a  group  results  in  all  the  children  knowing  m 
and  knowing  that  the  father  has  publicly  announced  m.  Assuming  that  it  is  common 
knowledge  that  all  of  the  children  know  anything  the  father  announces  publicly,  it 
is  easy  to  conclude  that  m  is  common  knowledge  once  the  father  announces  m. 
Once  the  father  announces  m  all  of  the  children  know  m  and  know  that  the  father 
announced  m.  Every  child  therefore  knows  that  all  of  the  children  know  m  and 
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know  that  the  father  publicly  announced  m,  and  therefore  E2 m  holds.  It  is  similarly 
possible  to  show  that  once  the  father  announces  m  Enm  holds  for  all  n,  so  Cm 
holds  (see  Section  3.1  for  further  discussion).  Since,  in  particular,  Ekm  holds,  the 
muddy  children  can  succeed  in  proving  their  dirtiness. 

A  large  part  of  the  communication  in  a  distributed  system  can  also  be  viewed 
as  the  act  of  improving  the  state  of  knowledge  (in  the  sense  of  “climbing  up  a 
hierarchy”)  of  certain  facts.  This  is  an  elaboration  of  the  view  of  communication  in 
a  network  as  the  act  of  “sharing  knowledge”.  Taking  this  view,  two  notions  come 
to  mind.  One  is  fact  discovery  -  the  act  of  changing  the  state  of  knowledge  of  a 
fact  v?  from  being  implicit  knowledge  to  levels  of  explicit  knowledge  (usually  S- , 
E-,  or  C-knowledge),  and  the  other  is  fact  publication  -  the  act  of  changing  the 
state  of  knowledge  of  a  fact  that  is  not  common  knowledge  to  common  knowledge. 
An  example  of  fact  discovery  is  the  detection  of  global  properties  of  a  system,  such 
as  deadlock.  The  system  initially  has  implicit  knowledge  of  the  deadlock,  and  the 
detection  algorithm  improves  this  state  to  5-knowledge  (see  [CL]  for  work  related 
to  fact  discovery).  An  example  of  fact  publication  is  the  introduction  of  a  new 
communication  convention  in  a  computer  network.  Here  the  initiator(s)  of  the 
convention  wish  to  make  the  new  convention  common  knowledge. 

In  the  following  chapters  we  will  devote  a  considerable  amount  of  attention  to 
fact  publication  and  common  knowledge.  As  we  shall  show,  common  knowledge  is 
inherent  in  a  variety  of  notions  such  as  agreement,  conventions,  and  coordinated 
action.  Furthermore,  having  common  knowledge  of  a  large  number  of  facts  allows 
for  better  and  shorter  communication.  Since  these  are  goals  frequently  sought  in 
distributed  computing,  the  problem  of  fact  publication  —  how  to  attain  common 
knowledge  —  becomes  crucial.  Common  knowledge  is  also  a  basic  notion  in  everyday 
communication  between  people.  For  example,  shaking  hands  to  seal  an  agreement 
signifies  that  the  handshakers  have  common  knowledge  of  the  agreement.  Also,  it 
can  be  argued  that  when  we  use  a  definite  reference  such  as  “the  president”  in  a 
sentence,  we  assume  common  knowledge  of  who  is  being  referred  to  (cf.  [C1M]). 

In  [C1M],  Clark  and  Marshall  present  two  basic  ways  in  which  a  group  can  come 
to  have  common  knowledge  of  a  fact,  One  is  by  membership  in  a  community,  e.g., 
the  meaning  of  a  red  traffic  light  is  common  knowledge  to  the  community  of  licensed 
drivers.  The  other  is  by  being  copresent  with  the  occurrence  of  the  fact,  e.g.,  the 
father’s  gathering  the  children  and  publicly  announcing  the  existence  of  muddy 
foreheads  made  that  fact  common  knowledge.  Notice  that  if,  instead,  the  father 
had  taken  each  child  aside  (without  the  other  children  noticing)  and  told  her  or 
him  about  it  privately,  this  information  would  have  been  of  no  help  at  all.  Indeed, 
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the  children  would  probably  think  it  was  rather  strange  of  him  to  tell  them  such  an 
obvious  fact. 

In  the  context  of  distributed  systems,  community  membership  corresponds  to 
information  that  the  processors  are  guaranteed  to  have  by  virtue  of  their  presence 
in  the  system  (e.g.,  information  that  is  “inserted  into”  the  processors  before  they 
enter  the  system).  However,  it  is  not  obvious  how  to  simulate  copresence  using 
message  passing  in  a  distributed  system.  What  is  the  analogue  of  making  “public” 
announcements  in  a  distributed  system?  As  we  shall  see,  there  are  serious  problems 
in  attempting  to  do  so. 

1.3  The  coordinated  attack  problem 

To  get  a  flavor  of  the  issues  involved  in  attaining  common  knowledge  by  simu¬ 
lating  copresence  in  a  distributed  system,  consider  the  coordinated  attack  problem, 
taken  from  the  operating  systems  folklore  (cf.  [Gal],  [Gr],  [YC]): 

Two  divisions  of  an  army  are  camped  on  two  hilltops  overlooking  a  common 
valley.  In  the  valley  awaits  the  enemy.  It  is  clear  that  if  both  divisions  attack 
the  enemy  simultaneously  they  will  win  the  battle,  whereas  if  only  one  division 
attacks  it  will  be  defeated.  The  divisions  do  not  initially  have  plans  for  launching 
an  attack  on  the  enemy,  and  the  commanding  general  of  the  first  division  wishes  to 
coordinate  a  simultaneous  attack  (at  some  time  the  next  day).  Neither  general  will 
decide  to  attack  unless  he  is  sure  that  the  other  will  attack  with  him.  The  generals 
can  only  communicate  by  means  of  a  messenger.  Normally,  it  takes  the  messenger 
one  hour  to  get  from  one  encampment  to  the  other.  However,  it  is  possible  that 
he  will  get  lost  in  the  dark  or,  worse  yet,  be  captured  by  the  enemy.  Fortunately, 
on  this  particular  night,  everything  goes  smoothly.  How  long  will  it  take  them  to 
coordinate  an  attack? 

We  now  show  that  despite  the  fact  that  everything  goes  smoothly,  no  agreement 
can  be  reached  and  no  general  can  decide  to  attack.  (This  is,  in  a  way,  a  folk 
theorem  of  operating  systems  theory;  cf.  [Gal],  [Gr],  [YC].)  Suppose  the  messenger 
starts  out  in  camp  A  carrying  the  message  “Let’s  attack  at  dawn”,  and  delivers 
it  to  camp  B  an  hour  later.  General  A  does  not  immediately  know  whether  the 
messenger  succeeded  in  delivering  the  message.  And  because  general  B  would  not 
attack  at  dawn  if  the  messenger  is  captured  and  fails  to  deliver  the  message,  gen¬ 
eral  A  will  not  attack  unless  he  knows  that  the  message  was  successfully  delivered. 
Consequently,  general  B  sends  the  messenger  back  to  general  A  with  an  acknowl¬ 
edgement.  Suppose  the  messenger  delivers  the  acknowledgement  to  general  A  an 
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hour  later.  Since  general  B  knows  that  general  A  will  not  attack  without  knowing 
that  B  received  the  original  message,  he  knows  that  A  will  not  attack  unless  the  ac¬ 
knowledgement  is  successfully  delivered.  Thus,  general  B  will  not  attack  unless  he 
knows  that  the  acknowledgement  has  been  successfully  delivered.  However,  for  gen¬ 
eral  B  to  know  that  the  acknowledgement  has  been  successfully  delivered,  general 
A  must  send  the  messenger  back  with  an  acknowledgement  to  the  acknowledgement 
- Similar  arguments  can  be  used  to  show  that  no  fixed  finite  number  of  acknowl¬ 
edgements,  acknowledgements  to  acknowledgements  etc.  suffices  for  the  generals  to 
attack.  Note  that  in  the  discussion  above  the  generals  are  essentially  running  a 
handshake  protocol  (cf.  [Gr]).  The  above  discussion  shows  that  for  no  k  does  a 
A:- round  handshake  protocol  guarantee  that  the  generals  be  able  to  coordinate  an 
attack. 

In  fact,  we  can  use  this  intuition  to  actually  prove  that  the  generals  can  never 
attack  and  be  guaranteed  that  they  are  attacking  simultaneously.  We  argue  by 
induction  on  n  —  the  number  of  messages  delivered  by  the  time  of  the  attack  — 
that  n  messages  do  not  suffice.  Clearly,  if  no  message  is  delivered,  then  general 
B  will  not  know  of  the  intended  attack,  and  a  simultaneous  attack  is  impossible. 
For  the  inductive  step,  assume  that  k  messages  do  not  suffice.  If  k  +  1  messages 
suffice,  then  the  sender  of  the  k  +  1**  message  attacks  without  knowing  whether 
his  last  message  arrived.  Since  whenever  one  general  attacks  they  both  do,  the 
intended  receiver  of  the  k  +  1st  message  must  attack  regardless  of  whether  the 
k  +  1st  message  is  delivered.  Thus,  the  k  +  1st  message  is  irrelevant,  and  k  messages 
suffice,  contradicting  the  inductive  hypothesis. 

After  presenting  a  detailed  proof  of  the  fact  that  no  protocol  the  generals  can 
use  will  satisfy  their  requirements  and  allow  them  to  coordinate  an  attack,  Yemini 
and  Cohen  in  [YC]  make  the  following  remark: 

. . .  Furthermore,  proving  protocols  correct  (or  impossible)  is  a  difficult  and  cum¬ 
bersome  art  in  the  absence  of  proper  formal  tools  to  reason  about  protocols.  Such 
backward-induction  argument  as  the  one  used  in  the  impossibility  proof  should 
require  less  space  and  become  more  convincing  with  a  proper  set  of  tools. 

Yemini  and  Cohen’s  proof  does  not  explicitly  reason  about  knowledge,  but  it 
uses  a  many-scenarios  argument  to  show  that  if  the  generals  safely  attack  in  one 
scenario,  then  there  is  another  scenario  in  which  one  general  will  attack  and  the 
other  will  not.  We  feel  that  understanding  the  role  knowledge  plays  in  problems 
such  as  coordinated  attack  is  a  first  step  towards  simplifying  the  task  of  designing 
and  proving  the  correctness  of  protocols. 
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A  protocol  for  the  coordinated  attack  problem,  if  one  did  exist,  would  ensure 
that  when  the  generals  attack,  they  are  guaranteed  to  be  attacking  simultaneously. 
Thus,  in  a  sense  an  attacking  general  (say  A)  would  know  that  the  other  general 
(say  B)  is  also  attacking.  Furthermore,  A  would  know  that  B  similarly  knows  that 
A  is  attacking.  It  is  easy  to  extend  this  reasoning  to  show  that  when  the  generals 
attack  they  in  some  sense  have  common  knowledge  of  the  attack.  However,  each 
message  that  the  messenger  delivers  can  add  at  most  one  level  of  knowledge  about 
the  desired  attack,  and  no  more.  For  example,  when  the  messenger  first  arrives  at 
camp  B,  general  B  knows  about  A’s  desire  to  coordinate  an  attack,  but  A  does  not 
know  whether  the  message  was  delivered,  and  therefore  A  does  not  know  that  B 
knows  about  the  intended  attack.  And  when  the  messenger  returns  to  camp  A  with 
an  acknowledgement,  A  knows  that  B  knows  about  the  intended  attack,  but,  not 
knowing  whether  the  messenger  delivered  the  acknowledgement,  B  does  not  know 
that  A  knows  (that  B  knows  of  the  intended  attack).  This  in  some  sense  explains 
why  the  generals  cannot  reach  an  agreement  to  attack  using  a  finite  number  of 
messages.  We  are  about  to  formalize  this  intuition.  Indeed,  we  will  prove  a  more 
general  result  from  which  the  inability  to  achieve  a  guaranteed  coordinated  attack 
will  follow  as  a  corollary.  Namely,  we  will  prove  that  communication  cannot  be 
used  to  attain  common  knowledge  in  a  system  in  which  communication  is  not  guar¬ 
anteed,  and  formally  relate  a  guaranteed  coordinated  attack  to  attaining  common 
knowledge.  Before  we  do  so,  we  need  to  define  some  of  the  terms  that  we  use  more 
precisely. 

1.4  Outline  of  the  thesis 

In  this  chapter  we  have  discussed  some  of  the  motivation  for  studying  knowl¬ 
edge  in  a  distributed  environment,  considered  two  puzzles  -  the  muddy  children 
puzzle  and  the  coordinated  attack  problem  -  which  illustrated  some  of  the  sub¬ 
tleties  involved  in  reasoning  about  knowledge  in  a  distributed  environment,  and 
introduced  a  variety  of  states  of  knowledge  that  correspond  to  the  knowledge  of 
a  group  of  individuals.  In  the  next  chapter  we  give  a  brief  formal  definition  of  a 
distributed  system  and  present  a  general  framework  for  ascribing  knowledge  (and 
belief)  to  processors  in  such  a  system.  Chapter  3  deals  with  the  general  problem  of 
fact  publication  -  attaining  common  knowledge  of  new  facts,  resulting  among  other 
things  in  a  formal  proof  of  a  generalization  of  the  coordinated  attack  problem, 
and  establishing  a  close  relationship  between  common  knowledge  and  simultane¬ 
ous  actions.  Chapter  4  introduces  states  of  knowledge  that  are  related  to  common 
knowledge  that  correspond  to  various  types  of  coordinated  actions,  and  states  of 
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knowledge  relative  to  a  relativistic  notion  of  time.  Results  about  the  inability  to 
attain  some  of  these  weaker  states  of  knowledge  when  communi action  is  unreliable 
further  generalizes  the  coordinated  attack  problem.  Chapter  5  is  a  case  study  of 
the  relationship  between  knowledge,  action,  and  communication.  An  analysis  of 
variants  of  the  cheating  husbands  puzzle  (essentially  the  muddy  children  puzzle) 
that  include  broadcasts  of  a  message  over  a  variety  of  communication  mediums  is 
performed,  within  the  context  of  a  fictional  story.  Chapter  6  applies  the  formalism 
developed  in  Chapter  2  to  the  study  of  systems  of  unreliable  processors  and  the 
Byzantine  agreement  problem.  It  is  shown  that  an  analysis  of  when  facts  that  are 
implicit  knowledge  become  common  knowledge  in  such  a  system  provides  consider¬ 
able  insight  into  the  fundamental  structure  of  fault-tolerant  protocols,  resulting  in 
improved  protocols  for  Byzantine  agreement  problem  and  many  related  proolems. 
Chapter  7  includes  some  concluding  remarks. 


Chapter  2 


Modeling  Knowledge  in  Distributed  Systems 


In  order  to  be  able  to  explicitly  reason  about  knowledge  in  a  distributed  sys¬ 
tem,  one  needs  to  have  the  means  to  make  precise  statements  about  the  state  of 
knowledge  of  the  system  at  any  given  point.  We  need  a  way  of  determining  both 
what  individual  processors  know  and  what  states  of  knowledge  groups  of  processors 
have.  Given  that  for  different  applications  we  are  interested  in  different  interpreta¬ 
tions  of  knowledge,  this  chapter  will  present  a  very  general  framework  for  ascribing 
knowledge  to  processors  in  a  distributed  system.  We  start  by  presenting  a  model  of 
a  distributed  system  in  Section  2.1.  In  Section  2.2  we  discuss  how  knowledge  (and 
beliefs)  can  be  ascribed  to  processors  in  such  a  system.  Section  2.3  introduces  a 
special  class  of  interpretations  of  knowledge  in  a  distributed  system  that  will  prove 
to  be  very  useful  later  on.  Finally,  in  Section  2.4  we  discuss  some  of  the  implications 
of  our  definitions. 

2.1  A  general  model  of  a  distributed  system 

We  view  a  distributed  system  as  a  finite  collection  {p,,pa, . . .  ,p„}  of  two  or  more 
processors  that  are  connected  by  a  communication  network.  We  assume  an  external 
source  of  “real  time”  that  in  general  is  not  directly  observable  by  the  processors. 
The  processors  are  state  machines  that  possibly  have  clocks,  where  a  clock  is  a 
monotone  nondecreasing  function  of  real  time.  The  pro  essors  communicate  with 
each  other  by  sending  messages  along  the  links  in  the  network.  At  a  given  real 
time,  a  processor’s  message  history  is  the  sequence  of  messages  it  has  sent  and 
received  (in  the  order  they  were  sent/received)  up  to  (but  not  including)  that  time. 
If  the  processor  has  a  clock,  then  the  messages  are  also  marked  by  thr>  time  on  the 
processor’s  clock  at  which  they  were  sent  or  received.  Every  processor’s  message 
history  is  always  finite.  (In  particular  this  implies  that  only  a  finite  number  of 
messages  can  be  delivered  in  the  system  in  a  finite  amount  of  time.) 

A  processor  is  assumed  to  be  in  a  distinguished  “sleeping”  state  until  it  “wakes 
up”  or  joins  the  system  at  some  point  in  real  time.  The  real  time  at  which  the 
processor  wakes  up  ;  ailed  the  processor’s  initial  time.  The  processor’s  internal 
state  when  it  wakes  up  is  called  its  initial  state.  Before  waking  up,  a  processor 
sends  and  receives  no  messages.  Thus,  a  processor’s  message  history  when  it  wakes 
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up  is  empty.  After  it  wakes  up,  we  assume  that  a  processor  follows  (or  executes) 
some  deterministic  protocol.1  We  give  a  formal  definition  of  protocols  shortly. 

A  protocol  is  a  function  specifying  what  actions  a  processor  takes  (which  in  our 
case  amounts  to  what  messages  it  sends)  at  any  given  point  as  a  function  of  the 
processor’s  initial  state,  message  history,  and  the  range  of  values  that  its  clock  has 
read  since  the  processor  “woke  up”.  (We  restrict  our  attention  to  deterministic 
protocols  for  notational  and  conceptual  clarity.  A  nondeterministic  protocol  can  be 
thought  of  as  a  family  of  deterministic  protocols,  each  corresponding  to  a  particular 
sequence  of  nondeterministic  choices.  Our  negative  results  will  immediately  carry 
over  to  nondeterministic  protocols,  due  to  this  observation.)  A  joint  protocol  for  G 
is  a  tuple  consisting  of  a  protocol  for  every  processor  in  G. 

A  processor’s  message  history  function  determines  the  processor’s  message  his¬ 
tory  as  a  function  of  real  time.  A  processor’s  clock  time  function  determines  the 
processor’s  clock  time  as  a  function  of  real  time.  A  run  r  of  a  distributed  system 
is  a  complete  history  of  its  behavior,  from  the  beginning  of  time  until  the  end  of 
time.  This  includes  each  processor  pi’s  initial  tinK  t0(pi,r),  its  initial  state  (its  state 
at  time  t0{pi,r)),  message  history  function,  clock  time  function,  and  the  protocol 
Pi  follows.  A  run  is  consistent  iff  the  actions  taken  by  each  processor  at  all  times 
are  precisely  those  that  are  specified  by  its  protocol  (cf.  [HF]).  We  never  consider 
inconsistent  runs.  A  point  is  a  pair  (r,  £),  where  r  is  a  run  and  £  is  a  real  number 
corresponding  to  the  (real)  time  £. 

Corresponding  to  every  distributed  system,  given  an  appropriate  set  of  assump¬ 
tions  about  the  properties  of  the  system  and  its  possible  interaction  with  its  en¬ 
vironment,  there  is  a  natural  set  S  of  the  possible  runs  of  the  system.  This  set 
essentially  contains  all  the  relevant  information  about  the  system.  The  relative  be¬ 
havior  of  clocks,  the  properties  of  communication  in  the  system,  and  many  other 
properties  of  the  system,  are  directly  reflected  in  the  properties  of  this  set  of  runs. 
We  will  identify  a  distributed  system  with  such  a  set  5  of  its  possible  runs.  As  we 
shall  see  in  the  sequel,  identifying  a  distributed  system  with  a  set  of  runs  5  allows 
us  to  define  properties  of  a  system  formally  in  a  rather  clean  way.  A  (possibly  joint) 
protocol  is  said  to  be  executed  in  S  if  there  is  a  run  of  S  in  which  it  is  executed. 

1  For  the  purposes  of  our  discussion  through  Chapter  5  we  are  essentially  assuming 
that  processors  are  reliable,  i.e.,  they  are  guaranteed  to  follow  their  protocols  in  good 
faith.  Modeling  systems  in  which  processors  may  be  faulty,  and  furthermore  modeling 
knowledge  in  such  systems,  adds  another  layer  of  complexity,  as  we  will  see  in  Chapter  6. 
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Intuitively,  a  processor’s  view  at  a  given  point  describes  everything  that  the 
processor  may  have  observed  by  that  point  in  the  run.  More  formally,  we  define 
a  processor  p/s  view  at  a  point  (r,f),  denoted  v(pj,r,<),  to  be  a  distinguished 
“inactive”  view  if  t  <  t0(pj ,  r),  and  otherwise  to  consist  of  pj's  initial  state,  message 
history,  and  the  range  of  values  its  clock  has  read  since  it  woke  up,  together  with  the 
protocol  pj  is  following  in  r.  We  include  the  protocol  as  part  of  the  view  because  a 
processor  may  often  have  access  to  the  protocol  it  is  following. 

2.2  Ascribing  knowledge  to  processors 

What  does  it  mean  for  a  processor  to  know  a  fact  y>?  In  our  opinion,  there  is 
no  unique  “correct”  answer  to  this  question.  Different  interpretations  of  knowledge 
in  a  distributed  system  are  appropriate  for  different  applications.  For  example,  an 
interpretation  by  which  a  processor  is  said  to  know  ip  only  if  y?  appears  explicitly 
in  a  designated  part  of  the  processor’s  storage  (its  “database”)  seems  interesting 
for  certain  applications.  In  other  contexts  we  may  be  interested  in  saying  that  a 
processor  knows  y?  if  the  processor  could  deduce  <p  from  the  information  available 
to  it  (say  by  using  a  logical  system  such  as  the  axioms  of  [Sa],[Hi],[Le],  or  [HM], 
possibly  with  a  specified  limitation  on  computational  resources;  cf.  [Kon]).  We  will 
shortly  define  yet  another  interpretation  of  knowledge  in  a  distributed  system  in 
which,  roughly  speaking,  a  processor  will  be  said  to  know  y?  if  the  processor’s  state 
implies  that  y?  holds. 

Although  the  notion  of  knowledge  may  have  a  number  of  interesting  interpre¬ 
tations,  there  are  two  properties  that  we  require  any  notion  of  knowledge  in  a 
distributed  system  to  satisfy.  First  of  all,  a  processor’s  knowledge  at  a  given  point 
must  be  a  function  of  its  view  of  the  run  by  that  point.  At  two  points  in  which 
the  processor  has  the  same  view,  the  processor  should  know  exactly  the  same  facts. 
Secondly,  under  no  circumstances  should  a  fact  y?  be  false  and  be  known  to  be  true 
at  the  same  time  (i.e.,  only  true  facts  can  be  known;  this  is  exactly  the  content  of 
the  requirement  that  Kx<p  D  y?  hold).  In  order  to  make  these  intuitions  precise,  we 
proceed  as  follows. 

We  assume  the  existence  of  an  underlying  logical  language  of  formulas  for  rep¬ 
resenting  ground  facts  about  the  system.  A  ground  fact  is  a  fact  about  the  state 
of  the  system  that  does  not  explicitly  involve  processors’  knowledge.  Formally,  a 
ground  fact  y?  will  be  identified  with  a  set  of  points  r(  y?)  C5x  (—00,00).  Given 
a  run  r  £  S  of  the  system  and  a  time  t,  we  will  say  that  y?  holds  at  (r,  <),  denoted 
(r,t)  \=  <p,  iff  (r,  t)  £  r(y?). 
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We  extend  the  original  language  of  ground  formulas  to  a  language  that  is  closed 
under  knowledge  operators  (for  every  formula  <p  and  processor  p<,  Kup  is  a  formula), 
common  knowledge  operators  (for  every  formula  <p  and  subset  G  of  the  processors, 
Ca<p  is  a  formula),  and  boolean  connectives.2  (In  Chapter  4  we  will  also  add 
temporal  operators.) 

Intuitively,  an  epistemic  interpretation  for  the  system  is  a  specification  of  what 
every  processor  knows  (or,  more  precisely,  believes)  at  any  given  point  as  a  function 
of  the  processor’s  view  of  its  history  at  that  point.  More  precisely,  an  epistemic 
interpretation  I  is  a  function  assigning  to  every  processor  pj  at  any  given  point  (r,  <), 
a  set  fCj(r,t)  of  facts  in  the  extended  language  that  pj  is  said  to  “know”.  ACj(r,  t) 
is  required  to  be  a  function  of  p/s  view  at  (r,<).  Thus  if  v(pj,r,t)  =  v(pj,r',t')  for 
two  points  (r,t)  and  (r',t'),  then  1Cf(r,t)  =  £j(r',t'). 

Given  an  epistemic  interpretation  I,  we  now  specify  when  a  formula  <p  of  the 
extended  language  holds  at  a  point  (r,t)  (denoted  (I,r,t)  (=  <p).  If  (p  is  a  formula 
of  the  original  language  (a  “ground”  formula),  then  <p  holds  at  (r,  t)  iff  (r,  <)  €  r(<p) 
(i.e.,  iff  <p  holds  at  (r,  t)  according  to  the  original  semantics).  If  <p  is  a  conjunction 
or  a  negation,  then  its  truth  is  defined  based  on  the  truth  of  its  subformulas  in 
the  obvious  way.  If  <p  is  of  the  form  KjxJ>,  then  <p  holds  (i.e.,  (T,  r,t)  \=z  Kjip)  iff 
t/>  €  /Cj(r,t).  If  ip  is  of  the  form  Catl>,  then  (J ,r,t)  (=  Carp  iff  for  all  sequences 
P«i » Pit » •  •  • » PiH  of  members  of  G,  it  is  the  case  that  (1,  r,  t )  |=  Kix  Kit  •  •  *  KXn  ip. 

When  talking  about  a  processor’s  knowledge ,  we  are  not  interested  in  epistemic 
interpretations  that  state  that  a  processor  knows  a  fact  <p,  when  <p  is  in  fact  false! 
(Otherwise  we  are  dealing  with  belief  or  some  related  notion,  but  not  strictly  with 
knowledge;  cf.  [HM].)  Given  an  epistemic  interpretation  I  and  a  set  of  runs  S,  we 
say  that  J  is  a  knowledge  interpretation  for  S  if  for  all  processors  pj.  times  t,  rims 
r  €  S  and  formulas  <p  in  the  extended  language,  it  is  the  case  that  ( J,  r,  t)  f=  Kjip 
implies  that  (T,r,t)  <p,  I.e.,  a  knowledge  interpretation  for  5  is  an  epistemic 

interpretation  that  for  the  runs  of  S  satisfies  the  axiom  Kup  D  ip. 

We  say  that  a  processor  pi  supports  Ca<p  at  (J,r,<)  if  p*  knows  all  the  fo- 
rumlas  of  the  form  KilK{J  •  •  •  Kin<p  that  constitute  CG<p  (that  is,  for  all  sequences 
PiuPi,,. . .  ,Pi„  of  members  of  the  group  G,  ( I,r,t )  f=  KiKil  Kti  ■  •  •  K{np  holds). 

2  When  appropriate,  operators  for  implicit  knowledge  and  other  states  of  knowledge 
should  also  be  added,  although  for  simplicity  we  do  not  add  them  here.  We  will  comment 
on  implicit  knowledge  later  in  the  next  section,  and  make  strong  use  of  it  in  Chapter  6. 
For  a  formal  treatment  of  implicit  knowledge,  also  see  [HM). 
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Lemma  2.1:  Let  I  be  a  knowledge  interpretation  for  S  and  let  r  E  S.  For  all 
processors  pi  E  G  at  all  points  (r,  t),  it  is  the  case  that  p,  supports  C0y?  at  (J,r,t) 
iff  (Z,r,t)  |=  Ca<p. 

Proof:  The  ‘if’  direction  follows  directly  from  the  definition.  For  the  other  direc¬ 
tion,  assume  to  the  contrary  that  pi  E  G  supports  Ca<p  and  that  (I,  r,  t)  Gap. 
Since  (T, r,t)  Ca<f,  there  must  be  some  formula  xp  of  the  form  K^Ki,  •  •  ■  K{nip 
such  that  (J,r,t)  xp.  But  since  pi  supports  Catp  at  (T,  r,  t),  we  must  have 
(I,  r,t)  |=  Kitp.  It  follows  that  the  interpretation  (Z,r,t)  does  not  satisfy  K{Xp  D  t/> 
and  thus  Z  is  not  a  knowledge  interpretation  for  S,  contradicting  our  original  as¬ 
sumption.  tx 

2.3  State-based  knowledge  interpretations 

There  are  many  possible  knowledge  interpretations  for  a  given  set  of  runs  S.  A 
very  important  class  of  interpretations  that  will  form  the  basis  of  our  discussion  in 
a  large  part  of  this  thesis  is  the  class  of  state-baaed  interpretations  relative  to  S.  In 
a  state-based  interpretation  Xa,  a  state  <r(pj,r,<)  is  associated  with  every  processor 
Pj  at  any  given  point  (r,t).  This  state  is  required  to  be  a  function  of  p/s  view 
at  (r,  <).  Thus,  if  v(pj,r,t)  =  v(pj,r',t')  then  necessarily  a(pj,r,t)  =  a(pj,r' ,t'). 
Roughly  speaking,  under  this  interpretation  a  processor  in  state  <r0  knows  <p  iff  <p 
holds  whenever  the  processor  is  in  state  <r0  in  a  run  of  S.  More  formally,  a  state- 
based  interpretation  Xa  is  the  unique  interpretation  satisfying:  E  Kf'(r,t)  iff 

(Xayr\t')  |=  for  all  points  (r',i')  with  r'  E  S,  satisfying  cr(pj,r,t)  =  cr(pj,r' ,t'). 
Under  a  state-based  knowledge  interpretation,  a  processor  does  not  know  <p  exactly 
if  it  is  possible  for  the  processor  to  be  in  the  same  state,  and  at  the  same  time  for 
V?  not  to  hold.3  Thus,  under  a  state-based  interpretation  Xa  we  have: 

(X <,,r,t)  \=  Kw  iff  (Z„,r',<')  f=  p  for  all  ( r\t ')  satisfying  <?(/>,, r,t)  =  <r(pi,r' ,t'). 

Given  the  definition  of  Ea<p  from  Kup,  we  now  also  have  that  [X <r,r,t)  \=  Eap 
iff  (Zff,r',t')  f=  v?  for  all  (r',t')  satisfying  a (pj,r,t)  =  <r(pj,r',t')  for  some  pj  €  G 

Given  5,  G ,  and  <r,  we  say  that  the  point  ( r is  reachable  from  ( r,t )  if  there 
exist  points  (r„, ,(rm,tm)  with  n  E  5,  and  processors  p*,...  G  G, 

such  that  ( r,t )  =  =  o-(pjt. ,ri+J,tj+t) 

for  1  <  t  <  m.  (Notice  that  reachability  is  an  equivalence  relation,  it  is  reflexive, 

3  Particular  state-based  interpretations  were  first  suggested  to  us  independently  by 
Cynthia  Dwork  and  by  Stan  Rosenschein. 
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symmetric,  and  transitive.)  It  is  now  easy  to  check  from  the  definition  of  Ea<p  that 
E™ip  holds  for  all  m  at  the  point  (r,  t)  iff  if  holds  at  all  points  reachable  from  (r,  t). 
We  thus  have: 

{Tc,r,t)  \=  Caf  iff  (Za,r' ,t')  f=  <p  for  all  (r',t')  reachable  from  (r,<). 

State-based  interpretations  are  used  by  many  researchers  in  the  field  (cf.  [ChM], 
[HF],  [R],  [PR]).  It  is  interesting  to  note  that,  for  the  runs  of  S,  a  state-based 
knowledge  interpretation  for  S  satisfies  all  the  axioms  of  S5.  In  particular  this 
means  that  a  processor’s  knowledge  is  closed  under  deduction:  If  the  processor 
knows  if  and  knows  that  if  D  ip  then  it  knows  xj>.  Furthermore,  processors  are  fully 
introspective:  each  processor  knows  what  facts  it  knows  and  what  facts  it  doesn’t 
know.  (See  Appendix  A  for  the  axioms  of  S5,  and  see  [Hi],  [HM],  [Sa]  and  [FHV] 
for  general  models  for  theories  of  knowledge  satisfying  S5.) 

A  state-based  interpretation  ascribes  knowledge  to  a  processor  without  the  pro¬ 
cessor  necessarily  being  “aware”  of  this  knowledge,  and  without  the  processor  need¬ 
ing  to  perform  any  particular  action  in  order  to  obtain  such  knowledge.  In  particu¬ 
lar,  if  our  assignment  of  states  does  not  distinguish  between  possibilities  at  all,  i.e., 
there  is  a  single  state  A  such  that  for  all  pj,  r,  and  t,  a(pj,r,t)  —  A,  the  processors 
are  still  ascribed  quite  a  bit  of  knowledge:  Every  fact  that  is  true  in  all  runs  at  all 
times  is  common  knowledge  to  all  the  processors  under  this  interpretation.  It  is 
interesting  to  note  that  the  hierarchy  of  Section  1.2  collapses  under  this  interpreta¬ 
tion,  and  lip  ~  Ef  =  Cif.  On  the  other  hand,  an  interpretation  which  we  will  find 
useful  in  the  sequel  is  the  total  view  interpretation,  which  makes  the  finest  possi¬ 
ble  distinction  among  views.  In  the  total  view  interpretation  the  state  <r(pj,r,t)  is 
defined  to  be  v(pj,r,t)  —  p/s  view  at  (r,  t). 

Another  popular  state-based  interpretation  is  one  in  which  a(pj,r,t)  is  defined 
to  be  p/s  internal  state  at  (r,t)  (recall  that  processors  are  state  machines).  Under 
this  interpretation  a  processor  might  “forget”  facts  that  it  knows.  In  particular, 
if  a  processor  can  arrive  at  a  given  state  by  two  different  message  histories,  then, 
once  in  that  state,  the  processor’s  knowledge  cannot  distinguish  between  these  two 
“possible  pasts”.  However,  in  the  total  view  interpretation,  a  processor’s  state 
encodes  all  of  the  processor’s  previous  states,  and  therefore  processors  do  not  forget 
what  they  know;  if  a  processor  knows  f  at  a  point  (r,<),  then  at  all  points  (r,t') 
with  t'  >  t  the  processor  will  know  that  it  once  knew  if.  Thus,  while  there  may  be 
temporary  facts  such  as  “it  is  3  on  my  clock”  which  a  processor  will  not  know  at 
4  o’clock,  it  will  know  at  4  o’clock  that  it  previously  knew  that  it  was  3  o’clock. 
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Once  we  have  an  assignment  of  states  to  processors,  we  can  extend  this  assign¬ 
ment  to  groups  G  of  processors  by  taking  the  state  of  G  at  a  given  point  to  consist 
of  a  description  of  the  states  of  the  members  of  G  at  that  point.  More  formally,  we 
define: 

<r(G,r,t)  =  {(pi,<7(p<,r,t))  :  p<  €  G}. 

It  is  natural  to  consider  what  happens  if  we  ascribe  “state-based”  knowledge  to 
G.  The  group  G  will  be  said  to  know  all  of  the  facts  known  to  all  of  its  members, 
as  well  as  all  of  their  consequences.  Thus,  the  group’s  knowledge  in  this  case 
corresponds  to  what  we  have  called  implicit  knowledge  in  Section  1.2.  Recall  that 
we  denote  this  knowledge  by  read  the  group  G  has  implicit  knowledge  of  <p. 
More  formally,  we  say  that  IaKp  iff  f=  for  all  points  (r',t') 

such  that  cr(G,  r,t)  =  <r(G,r'  ,t').  Thus,  in  the  case  of  state-based  interpretations  of 
knowledge,  implicit  knowledge  provides  a  modular  way  of  defining  the  knowledge 
of  elements  of  a  system  in  terms  of  their  components’  knowledge.4 

2.4  Discussion 

This  chapter  has  presented  a  general  framework  for  modeling  knowledge.  An 
interesting  property  of  our  framework  is  that  it  uses  a  model  of  a  distributed  sys¬ 
tem  as  the  (semantic)  object  relative  to  which  knowledge  is  defined.  Given  that 
in  the  process  of  designing  protocols  for  a  distributed  system,  one  usually  starts 
out  from  a  model  of  the  system,  it  seems  very  natural  to  define  knowledge  in  this 
way,  rather  than  starting  out  from  some  abstract  model  theory  for  knowledge  (e.g., 
[Hi],[Sa],[FHV],[HM])  and  then  having  the  burden  of  relating  this  model  to  the  sys¬ 
tem.  In  fact,  this  idea  is  by  now  quite  popular  in  the  field  (cf.  [ChM],[FI],[HF],[PR]). 
However,  work  on  abstract  models  of  knowledge  may  often  prove  relevant  to  our 
interests.  For  example,  it  can  be  shown  that  there  is  a  direct  correspondence  be¬ 
tween  state-based  knowledge  interpretations  and  a  particular  well-behaved  kind  of 
Kripke  structures  (cf.  [HMj)  or  knowledge  structures  (cf.  [FHV]).  Epistemic  inter¬ 
pretations  and  knowledge  interpretations  were  deliberately  defined  in  a  very  general 
way.  In  any  particular  application,  a  particular  knowledge  interpretation  should  be 
chosen.  Indeed,  the  models  for  knowledge  used  in  [ChM],[HF],  and  [PR],  among 
others,  immediately  fit  into  our  framework.  We  would  argue  that  this  framework  is 
sufficiently  general  to  accomodate  any  reasonable  notion  of  knowledge. 

4  The  knowledge  ascribed  to  a  set  of  processes  by  Chandy  and  Misra  in  [ChM]  essentially 
corresponds  to  the  implicit  knowledge  of  its  members,  as  defined  here.  See  also  [R]. 
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Given  a  particular  interpretation  for  knowledge  in  a  particular  system,  it  be¬ 
comes  possible  to  determine  what  the  system’s  state  of  knowledge  at  any  given 
point  is.  It  is  then  possible,  for  instance,  to  specify  attaining  a  specific  state  of 
knowledge  as  the  goal  of  a  particular  communication  protocol.  For  example,  the 
goal  of  standard  handshake  protocols  (cf.  [Gr])  seems  to  be  to  guarantee  that  the 
sender  of  a  piece  of  data  will  repeatedly  attempt  to  send  the  data  (and  not  discard 
the  data)  until  it  knows  that  the  data  was  delivered  to  its  destination.  Indeed, 
Papageorgiou  (cf.  [P])  makes  essential  use  of  such  specifications,  and  it  is  not  clear 
how  one  would  otherwise  state  the  goals  of  the  communication  protocols  he  de¬ 
signs.  Thus,  it  appears  that  formalisms  based  on  knowledge  may  prove  to  be  a 
powerful  tool  for  specifying  and  verifying  protocols.  Furthermore,  it  seems  quite 
reasonable  that  such  a  formalism  will  readily  applicable  to  the  synthesis  of  protocols 
and  plans.  (Temporal  logic  has  already  proved  somewhat  successful  in  this  regard; 
cf.  [CE],  [MW].)  Halpera  and  Fagin  in  [HF]  take  this  idea  one  step  further.  They 
suggest  a  notion  of  of  a  knowledge-based  protocol  in  which  the  actions  performed  by 
a  processor  are  a  function  of  the  facts  known  to  the  processor.  Since  a  processor’s 
knowledge  is  determined  by  the  processor’s  view,  and  since  in  normal  protocols  a 
processor’s  actions  are  a  function  of  the  processor’s  view,  it  turns  out  that  in  any 
fixed  system  an  implement  able  knowledge-based  protocol  is  equivalent  to  a  normal 
protocol.  However,  a  knowledge-based  description  of  a  protocol  seems  to  communi¬ 
cate  the  protocol  at  a  much  higher  level.  In  some  cases  it  seems  to  focus  attention 
to  the  basic  structure  and  essential  ingredients  of  the  protocol,  making  it  easier  to 
communicate  and  to  port  from  one  system  to  another.  We  will  find  knowledge-based 
protocols  useful  for  our  analysis  in  chapters  5  and  6. 
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As  we  have  seen  in  the  introduction,  common  knowledge  seems  to  play  an  impor-  * 

tant  role  in  certain  situations,  e.g.,  in  everyday  agreements  between  people  and  in 
coordinating  actions.  In  this  chapter  we  address  the  problem  of  attaining  common 
knowledge  in  a  distributed  environment.  Sections  3.1  shows  that  when  communi¬ 
cation  is  not  guaranteed  no  amount  of  successful  communication  can  bring  about 
common  knowledge.  Section  3.2  goes  on  to  show  that,  formally  speaking,  common 
knowledge  cannot  be  attained  at  all  in  practical  distributed  systems.  Section  3.3 
discusses  the  implications  of  the  results  of  Section  3.2,  and  Section  3.4  presents  a 
sense  in  which  something  very  similar  to  common  knowledge  can  in  fact  be  attained 
in  practical  systems. 

3.1  Unreliable  communication 

Following  the  coordinated  attack  example,  we  first  consider  systems  in  which 
communication  is  not  guaranteed.  Intuitively,  communication  is  not  guaranteed  in 
a  system  5  if  messages  sent  in  S  might  fail  to  be  delivered  in  an  arbitrary  fashion, 
independent  of  any  other  event  in  the  system.  Making  this  intuition  precise  is 
somewhat  cumbersome  (cf.  [HF]),  and  we  will  not  attempt  to  do  so  here.  For  our 
purposes,  a  weaker  condition,  which  must  be  satisfied  by  any  reasonable  definition 
of  the  notion  “communication  is  not  guaranteed”,  will  suffice.  Roughly  speaking, 
if  communication  in  the  system  is  not  guaranteed  then  it  must  always  be  possible 
that  all  messages  past  a  certain  point  will  not  be  delivered. 

A  run  r'  is  said  to  extend  a  point  (r,f)  if  the  histories  of  (r,t)  and  of  ( r',t ) 
are  identical.  More  formally,  r'  extends  (r,  t)  if  all  processors  have  the  same  initial 
times,  initial  states  in  both  r  and  r',  and  all  processors’  message  history  functions 
and  clock  time  functions  up  to  (but  not  including)  time  t  are  identical  in  r  and  in 
r'.  Given  a  system  S,  we  say  that  communication  in  S  is  not  guaranteed  if  the 
following  condition  holds: 

(*)  For  every  run  r  €  5,  real  time  t,  processor  pi,  and  set  M  of  messages,  there  is  a 
run  r'  €  5  that  extends  (r,  t),  such  that  at  (r',t)  processor  pi  does  not  receive 
any  message  from  the  set  M  and  pi  does  receive  all  the  messages  m  $  M  that 
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it  receives  at  (r,  <),  all  processors  py  ^  pi  receive  exactly  the  same  messages  at 

(r',t)  as  they  do  at  (r,  <),  and  no  messages  axe  delivered  in  r'  after  time  t. 

Thus,  (r,  t')  and  (r' ,  t')  are  identical  for  all  t'  <  t,  and  at  time  t  all  processors  py  ^  pi 
receive  the  same  messages  in  r  and  r' ,  while  processor  pi  might  miss  some  in  r'  that 
it  receives  in  r  (namely,  those  in  M).  The  coordinated  attack  problem  suggests  that 
when  communication  is  not  guaranteed,  common  knowledge  is  not  attainable.  In 
fact,  the  following  lemma  shows  that  in  such  a  system,  communicated  messages  play 
no  role  in  determining  what  facts  are  common  knowledge  and  when  facts  become 
common  knowledge. 

Lemma  3.1:  Let  5  be  a  system  in  which  communication  is  not  guaranteed,  and 
let  1  be  a  knowledge  interpretation  for  S.  Let  r, ,  r  €  5  be  runs  such  that  r  extends 
(rx, <i),  and  let  t  >  tt.  Then  for  all  formulas  (p  it  is  the  case  that  (I, r,,<)  Cip  iff 
(I,  r,  t)  |=  Cip. 

Proof:  Fix  ip.  Denote  by  r~  the  run  extending  ( rt,tx )  in  which  no  messages 
are  delivered  after  time  tt.  Notice  that  it  follows  from  (*)  that  r~  £  5.  We  will 
now  show  that  for  all  runs  r  extending  (rt,t,),  it  is  the  case  that  (T, r,t)  Cip  iff 
(T,  r~ ,  t )  )=■  Cip.  The  proof  is  by  induction  on  the  number  n(r)  of  messages  delivered 
in  r  in  the  interval  [<,,<).  The  case  n(r)  =  0  is  trivial,  because  all  processors  have 
identical  views  at  (r,<)  and  at  (r“,t).  Assume  inductively  that  the  claim  holds  for 
all  runs  r'  6  S  extending  (rl,tl)  with  n(r#)  <  k,  and  assume  that  n(r)  —  k  +  1.  Let 
t'  <  t  be  the  latest  time  in  which  a  message  is  delivered  in  r  before  time  t.  Let  pi  and 
m  be  such  that  m  is  delivered  to  processor  pi  at  time  t'  in  r.  Since  communication 
in  the  system  is  not  guaranteed,  there  is  a  run  r'gS  extending  (r,  t')  such  that  all 
processors  pj  ^  p,  receive  the  same  messages  at  (r',tf)  and  at  (r,f'),  processor  pi 
does  not  receive  m  at  (r',t'),  and  no  messages  are  delivered  in  r'  after  t' .  Since  only 
fc  messages  are  delivered  in  r'  between  tx  and  tf,  by  assumption  (J,r“,<)  }=  C<p  iff 
(J ,r',t)  f =  C<p.  Let  pj  ^  pi  be  another  processor.  By  Lemma  2.1,  pj  supports  C<p 
at  ( r\t )  iff  (J,  r#,<)  f=  Cip.  However,  p,  ’s  view  at  (r',  t)  is  identical  to  its  view  at 
(r,<).  Therefore,  pj  supports  Cip  at  (r,  t)  iff  py  supports  Cip  at  ( r',t ).  Again  by 
Lemma  2.1  it  follows  that  (T,  r~,t)  (=  Cip  iff  (J, r, t)  |=  Cip ,  which  concludes  the 
proof  of  the  inductive  step.  The  claim  follows  by  induction.  txi 

We  say  that  a  formula  xj)  is  undetermined,  in  a  system  5  at  a  given  point  (r,  t)  if 
it  is  possible  at  that  point  that  t/>  will  never  hold.  More  formally,  tp  is  undetermined 
in  S  at  (r,  t)  if  for  some  run  r'  €  5  extending  (r,  t),  it  is  the  case  that  for  no  t'  >  t 
does  ip  hold  at  (r',  t').  A  formula  ip  is  said  to  be  determined  in  S  at  (r,t)  if  it  is  not 
undetermined.  As  an  easy  corollary  to  Lemma  3.1,  we  have: 
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Theorem  3.2:  Let  S  be  a  system  in  which  communication  is  not  guaranteed 
such  that  C<f  is  undetermined  in  S  at  (r,,  <t).  If  r  g  S  extends  (ruti)  and  t  >  tx, 
then  Ctp  does  not  hold  at  (r,t). 

Proof:  From  the  fact  that  CV  is  undetermined  in  S  at  (rx ,  tx)  it  follows  that  there 
is  a  ran  r'eS  extending  (rl5  tx)  such  that  Cip  does  not  hold  at  (V,  t')  for  any  t'  >  tx. 
Let  r  g  5  be  a  run  that  extends  (ri,^),  and  fi x  t  >tx.  By  Lemma  3.1,  C<p  holds 
at  (r',t)  iff  it  holds  at  (r,,t)  iff  it  holds  at  (r,  t).  Thus,  C<p  does  not  hold  at  ( r,t ). 
Since  r  and  t  >tx  were  chosen  arbitrarily,  the  theorem  follows.  txl 

The  proofs  of  Lemma  3.1  and  Theorem  3.2  apply  to  weaker  conditions  than 
communication  not  being  guaranteed.  A  system  S  is  said  to  be  a  system  with 
unbounded  message  delivery  times  if  the  following  condition  holds: 

(**)  For  all  runs  r  €  S,  real  times  t  and  t',  processors  pi,  and  sets  M  of  messages, 
there  is  a  run  r'  g  S  that  extends  (r,t)  such  that  at  (r',t)  processor  pi  receives 
all  and  only  the  messages  m  £  M  that  it  receives  at  (r,  t),  and  all  processors 
Pj  pi  receive  exactly  the  same  messages  at  (r',f)  as  they  do  at  (r,  t),  and  no 
messages  are  delivered  in  r'  after  time  t  and  before  time  t'. 

Asynchronous  systems  are  often  defined  to  be  systems  with  unbounded  message 
delivery  times.  Intuitively,  condition  (**)  says  that  it  is  always  possible  for  no 
messages  to  be  delivered  for  arbitrarily  long  periods  of  time,  whereas  (*)  says  that 
it  is  always  possible  for  no  message  to  be  delivered  at  all  from  some  time  on.  In 
some  sense,  we  can  view  (*)  as  a  limit  case  of  (**).  Not  surprisingly,  it  is  easy  to 
check  that  we  can  replace  (*)  by  (**)  in  the  proofs  of  Lemma  3.1  and  Theorem  3.2, 
so  we  also  get: 

Corollary  3.3:  If  S  is  a  system  with  unbounded  message  delivery  times  and  C<p 
is  undetermined  in  S  at  (rx,ti),  then  in  no  run  r  g  S  extending  (rx ,tx)  does  C<p 
ever  hold  at  a  time  t  >tx.  tx 

Returning  to  the  coordinated  attack  problem,  we  are  now  in  a  position  to  relate 
the  generals’  problem  to  the  problem  of  attaining  common  knowledge,  and  present 
a  simple  formal  proof  of  the  impossibility  of  their  agreeing  to  attack.  We  do  this  as 
follows:  The  description  of  the  coordinated  attack  problem  in  Section  1.3  describes 
a  specific  state  of  affairs.  Without  loss  of  generality,  we  denote  the  real  time  in 
which  the  generals  are  in  this  situation  by  Formally,  we  consider  the  generals 
as  processors  and  their  messengers  as  communication  links  between  them.  The 
generals  are  assumed  to  each  behave  according  to  some  predetermined  deterministic 
protocol;  i.e.,  a  general’s  actions  (what  messages  it  sends  and  whether  it  attacks) 
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at  a  given  point  are  a  deterministic  function  of  his  message  history  and  the  time  on 
his  clock.  In  particular,  we  assume  that  the  generals  are  following  a  joint  protocol 
(P,  P'),  where  general  A  follows  P  and  the  general  B  follows  P\  Thus,  we  identify 
the  generals  with  a  distributed  system  5.  The  runs  of  S  are  simply  all  possible  runs 
of  (P,  P')  from  <t  on. 

Proposition  3.4:  In  the  coordinated  attack  problem,  any  protocol  that  guaran¬ 
tees  that  if  either  party  attacks  then  they  both  attack  simultaneously,  is  a  protocol 
in  which  necessarily  neither  party  attacks  (!). 

Proof:  Let  (P,P')  be  a  joint  protocol  for  the  generals,  and  assume  that  it  guar¬ 
antees  that  no  general  will  attack  alone.  Since  the  generals  are  said  not  to  initially 
have  plans  to  attack  the  enemy,  we  can  also  assume  that  (P,  P')  is  such  that  no 
general  will  attack  in  the  absence  of  any  successful  communication.  Let  t1  and  5 
be  defined  as  above.  For  pi  €  {A,  B },  r  €  S,  and  t  >tx,  define  <r(pi,  r,  t)  by: 


attacking 
not  attacking 


if  pi  has  started  attacking  by  (r,  <); 
otherwise. 


o  is  clearly  defined  as  a  function  of  a  general’s  view.  Let  X„  be  the  state-based 
knowledge  interpretation  relative  to  S  corresponding  to  <r.  The  system  S  is  clearly 
one  in  which  communication  is  not  guaranteed,  since  it  is  always  possible  that  no 
messenger  will  succeed  in  delivering  any  message  from  some  point  on.  In  the  run 
r~  6  S  in  which  no  messages  are  successfully  delivered,  no  general  will  attack. 
Consider  the  fact  rp  =“both  generals  are  attacking”.  Since  for  all  r  G  S  it  the  case 
that  r~  extends  (r,tx),  it  follows  that  tp  is  undetermined  at  (r,<x)  for  all  r  €  S. 
Because  Crp  D  rp,  the  fact  that  rp  is  undetermined  at  (r,  tx)  for  all  r  6  S  implies 
that  Crp  is  undetermined  at  (r, tx)  for  all  r  €  S.  Since  (P, P')  guarantees  that 
whenever  one  division  attacks  both  attack  simultaneously,  it  follows  that  at  all 
points  (r,  t)  €  S  x  [tu  oo)  it  is  the  case  that  both  generals  are  ascribed  the  same 
state,  i.e.,  cr(A,r,t)  =  cr(B,r,t).  Consequently,  all  of  the  points  reachable  from  a 
point  (r,t)  in  which  the  generals  are  in  an  attacking  state  have  the  property  that 
both  generals  jure  in  an  attacking  state.  FYom  Section  2.3  we  thus  have  that  Crp 
holds  when  the  generals  attack.  Because  communication  is  not  guaranteed  in  S  and 
Crp  is  undetermined  at  (r,<x),  Theorem  3.2  implies  that  Crp  does  not  hold  at  (r,  t). 
It  follows  that  the  generals  can  never  attack!  ex 

The  requirement  of  a  simultaneous  attack  in  the  coordinated  attack  problem  is  a 
very  strong  one.  It  seems  that  real  life  generals  do  not  need  a  protocol  that  guaran¬ 
tees  such  a  strong  condition,  and  can  probably  make  do  with  one  that  guarantees  a 
non-simultaneous  attack.  Theorem  3.2  does  not  imply  that  a  protocol  for  achieving 
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this  does  not  exist.  However,  in  Section  4.2  we  will  use  a  variant  of  this  argument 
to  show  that  no  protocol  can  even  guarantee  that  if  one  party  attacks  then  the 
other  will  eventually  attack!  We  might  also  consider  weakening  the  condition  to 
one  where  if  one  party  attacks,  then  with  high  likelihood  the  other  will  attack.  This 
in  fact  is  achievable.  We  discuss  it  in  further  detail  in  Section  4.4. 

3.2  Reliable  communication 

The  previous  results  show  that,  in  a  strong  sense,  common  knowledge  is  not 
attainable  in  a  system  in  which  communication  is  not  guaranteed.  However,  even 
when  communication  is  guaranteed,  common  knowledge  can  be  elusive.  To  see 
this,  consider  a  system  consisting  of  two  processors,  R2  and  D2,  connected  by  a 
communication  link.  R2  and  D2  use  a  common  (global)  clock,  communication 
(delivery)  is  guaranteed,  and  furthermore,  it  is  (say  commonly)  known  that  any 
message  sent  from  R2  to  D2  reaches  D2  either  immediately  or  after  exactly  e  seconds. 
At  time  ts ,  R2  sends  D2  a  message  m  that  does  not  contain  a  time  stamp,  i.e.,  does 
not  mention  ts  in  any  way.  The  message  m  is  delivered  to  D2  at  time  to-  Let 
senium)  be  the  fact  “the  message  m  has  been  sent”.  D2  doesn’t  know  sent(m) 
initially.  How  does  {R2,  D2}’s  state  of  knowledge  of  senUjn )  change  with  time? 

At  time  to,  D2  knows  sent(m).  Because  it  might  have  taken  e  time  units  for 
m  to  be  delivered,  R2  cannot  be  sure  that  D2  knows  sent(m)  before  ts  4-  e.  D2 
knows  that  R2  will  not  know  that  D2  knows  seni(m )  before  <5  +  e,  and  because  for 
all  D2  knows  m  may  have  been  delivered  immediately  (in  which  case  ts  =  to),  D2 
does  not  know  that  R2  knows  that  D2  knows  sent{m )  before  to  +  e.  Now,  R2  must 
wait  until  ts  4-  2e  before  he  knows  that  to  4-  e  has  passed.  This  line  of  reasoning 
can  be  continued  indefinitely,  and  an  easy  proof  by  induction  shows  that  before 
time  ts  4-  ne,  the  formula  (K  rK  o)n  sent(m)  does  not  hold,  while  at  ts  4-  ne  it  does 
hold.  Thus,  it  “costs”  e  time  units  to  acquire  every  level  of  “R2  knows  that  D2 
knows”.  Recall  that  C(seni(m))  implies  (K rK o)n sent(m)  for  every  n.  It  follows 
that  C(sent(m ))  will  never  be  attained! 

Now  consider  what  would  happen  if  R2  sends  D2  the  following  message  m': 

“This  message  is  being  sent  at  time  ts  (and  will  reach  D2  by  ts  4  c  at  the  lat¬ 
est);  m.” 

Since  they  are  using  a  common  clock,  the  fact  that  R2  sent  m'  to  D2  would  become 
common  knowledge  at  time  ts  4-  e! 

What  is  the  essential  difference  between  these  two  situations?  It  seems  that 
what  made  achieving  common  knowledge  easy  in  the  latter  case  was  the  possibility 


SECT.  3.2 


RELIABLE  COMMUNICATION  25 


of  simultaneously  making  the  transition  from  not  having  common  knowledge  to 
having  common  knowledge.  The  impossibility  of  doing  so  in  the  former  case  was 
the  driving  force  behind  the  extra  cost  in  time  incurred  in  attaining  extra  levels 
of  knowledge.  In  fact,  Lemma  2.1  already  implies  that  when  C<p  first  holds  all 
processors  must  come  to  support  C<p  simultaneously.  In  particular,  this  means  that 
all  of  the  processors’  views  must  change  simultaneously.  However,  there  is  a  sense 
in  which  practical  systems  cannot  guarantee  such  simultaneity.  Intuitively,  the 
uncertainties  regarding  relative  readings  of  clocks  and  message  transmission  times 
in  practical  systems  imply  that  there  will  always  be  two  processors  p,-  and  pj  and 
two  runs  such  that  pi  s  behavior  in  both  runs  is  identical,  and  in  one  of  the  runs 
Pj  performs  all  actions  6  time  units  later  than  in  the  first,  for  some  6^0.  (This  6 
may,  in  some  cases,  be  very  small.)  We  now  make  this  claim  precise. 

Given  a  system  S  and  a  knowledge  interpretation  Z,  we  say  that  common  knowl¬ 
edge  is  attainable  in  5  w.r.t.  Z  if  there  is  a  (joint)  protocol  P  =  (Pi , . . . ,  Pn)  executed 
in  S  and  a  fact  <p  such  that  in  all  the  runs  of  S  in  which  the  processors  follow  P  it 
is  the  case  that  <p  is  not  common  knowledge  before  any  processor  “wakes  up”  (i.e., 
(Z ,r,t)  f=  ->C<p  for  all  times  t  <  min,  t0(pj,  r)),  and  C<p  holds  at  some  later  point 
in  the  run.  Thus,  roughly  speaking,  P  “makes  common  knowledge”. 

A  system  S  is  said  to  have  essential  temporal  imprecision  if  for  all  (joint)  pro¬ 
tocols  P  =  (Pi,. . .  ,P„)  executed  in  5,  there  exist  runs  r,,ra  G  5  in  which  P  is 
executed,  processors  p,  ,pj,  and  a  real  number  6^0  such  that  for  all  t  it  is  the  case 
that  v(pi,rut )  =  v(pi,r„t )  and  v(pj,rl,t )  =  v(pj,r,,t  +  6). 

Dolev,  Halpem,  and  Strong  show  in  [DHS]  that  a  system  in  which  (i)  there  is 
an  uncertainty  regarding  the  relative  “initial  times”  in  which  the  processors  start 
running,  and  (ii)  there  are  upper  and  lower  bounds  on  message  transmission  times 
along  the  links  in  the  system,  with  the  upper  bounds  strictly  larger  than  the  lower 
bounds,  is  a  system  with  inherent  temporal  imprecision  (even  if  the  processors’ 
clocks  are  guaranteed  to^gm  at  the  same  rate!).  It  can  thus  be  argued  that  all 
practical  distributed  systems  have  essential  temporal  imprecision. 

As  an  easy  consequence  of  the  definitions,  we  now  have: 

Theorem  3.5:  If  5  is  a  system  with  essential  temporal  imprecision,  then  common 
knowledge  is  not  attainable  in  S. 

Proof:  Assume  the  contrary,  and  let  P  =  (Pi , . . . ,  Pn)  be  a  protocol  executed  in 
5  and  let  <p  be  a  fact  such  that  before  Cy?  does  not  initially  hold  in  the  runs  of  P 
and  such  that  Cip  does  hold  at  some  point  in  all  runs  r  €  S  in  which  the  processors 
execute  P.  Given  that  there  is  essential  temporal  imprecision  in  S,  let  pi,  pj ,  6, 
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rt  and  ra  be  as  in  the  definition  above.  In  particular,  the  processors  follow  P  in 
rt  and  in  r2.  Lemma  2.1  implies  that  in  all  runs  of  P,  processors  p,  and  py  start 
supporting  CV  simultaneously  (i.e.,  at  the  same  real  time).  Let  t'  be  the  (real)  time 
in  which  p2  and  py  first  support  C<p  in  r,.  In  particular,  py  does  not  support  C<p 
at  any  earlier  time  in  rt.  Since  v(pi,rx,t')  =  v(pi,r2,t')  we  have  that  p,  first  starts 
supporting  C<p  at  time  t'  in  r2  as  well.  However,  since  u(py,r1,t')  =  v(p},r2,t'  +  6), 
processor  py  does  not  support  C<p  before  (r2,  t'  -f-  6).  It  follows  that  pi  and  py  do 
not  start  supporting  C<p  at  the  same  time  in  rs,  contradicting  Lemma  2.1.  tx 

In  fact,  a  stronger  notion  of  imprecision  holds  in  practice:  for  all  runs  of  the 
system  there  is  another  run  in  which  pi’s  behavior  is  identical  to  the  current  run, 
and  py’s  behavior  is  shifted  by  6.  More  formally,  we  say  that  a  system  S  has 
temporal  imprecision  in  all  runs  if  for  all  runs  r  E  S  there  exists  a  run  r'  €  S, 
processors  pi  and  py  and  real  number  6^0  such  that  for  all  times  t  it  is  the  case 
that  v(pi,r,t )  =  v(pi,r',t)  and  v(py,r,<)  =  v(py,r',t  +  6).  (Although,  again,  the  S’ s 
involved  may  be  very  small.)  A  proof  similar  to  that  of  Theorem  3.5  now  shows: 

Theorem  3.6:  If  5  is  a  system  with  temporal  imprecision  in  all  runs,  I  is  a 
knowledge  interpretation  for  S  and  (T,  r,  t)  (=  then  (I,  r,t')  [=  -i C<p  for  all 

t'  >  t.  ixi 

Theorems  3.5  and  3.6  imply  that,  strictly  speaking,  common  knowledge  cannot 
be  attained  in  practical  distributed  systems!  In  such  systems,  we  have  the  following 
situation:  a  fact  <p  can  be  known  to  a  processor  without  being  common  knowledge, 
or  it  can  be  common  knowledge  (in  which  case  that  processor  also  knows  <p),  but 
due  to  (possibly  negligible)  imperfections  in  the  system’s  state  of  synchronization 
and  its  communication  medium,  there  is  no  way  of  getting  from  the  first  situation 
to  the  second! 

Observe  that  we  can  now  show  that,  formally  speaking,  even  people  cannot  at¬ 
tain  common  knowledge  of  any  new  fact!  Consider  the  father  publicly  announcing 
m  to  the  children  in  the  muddy  children  puzzle.  Even  if  we  assume  that  it  is  com¬ 
mon  knowledge  that  the  children  all  hear  whatever  the  father  says  and  understand 
it,  there  remains  some  uncertainty  as  to  exactly  when  the  children  each  come  to 
know  (or  comprehend)  the  father’s  statement.  Thus,  it  is  easy  to  see  that  the  chil¬ 
dren  do  not  immediately  have  common  knowledge  of  the  father’s  announcement. 
Furthermore,  for  similar  reasons  the  father’s  statement  can  never  become  common 
knowledge. 
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3.3  >  paradox? 

There  is  a  close  correspondence  between  agreements,  coordinated  actions,  and 
common  knowledge.  We  have  shown  that  in  a  precise  sense,  reaching  agreements  and 
coordinating  actions  in  a  distributed  system  amount  to  attaining  common  knowl¬ 
edge  of  certain  facts.  We  also  proved  that  common  knowledge  cannot  be  attained 
in  practical  distributed  systems!  However,  it  is  well  known  that  operations  such 
as  reaching  agreement  and  coordinating  actions  axe  routinely  performed  in  many 
actual  distributed  systems.  And  it  might  seem  as  if  the  designers  of  such  systems 
do  not  find  it  necessary  to  worry  about  common  knowledge. 

Where  is  the  catch?  How  can  we  explain  this  apparent  discrepancy  between  our 
formal  treatment  and  practical  experience?  It  seems  that  our  insistence  on  defining 
knowledge  in  such  a  way  that  facts  that  are  known  must  be  true  at  the  same  instant 
in  absolute  time  is  at  the  root  of  the  problem.  After  all,  absolute  time  often  does 
not  seem  to  be  the  relevant  notion  of  time  in  many  distributed  applications.  But 
does  it  make  any  sense  to  define  knowledge  in  a  distributed  system  in  any  other 
fashion?  We  u>nsider  this  a  major  open  question.  We  will  touch  upon  it  in  the  next 
chapter  when  we  discuss  knowledge  of  facts  relative  to  a  relativistic  notion  of  time. 
Another  weakness  of  the  impossibility  result  is  the  fact  that  it  relies  on  a  very  fine¬ 
grained  view  of  practical  systems,  which  forces  us  to  conclude  that  simultaneity 
is  not  attainable.  This  is  similar  to  the  claim  that  actual  bits  in  a  computer  do 
not  exclusively  contain  the  values  0  or  1,  but  can  sometimes  be  in  an  undefined 
or  incoherent  intermediate  state.  While  true,  this  fact  does  not  seem  to  have  an 
overwhelming  effect  on  many  aspects  of  computing.  It  can  be  taken  care  of  on  the 
hardware  level,  and  for  all  practical  purposes  software  designers  can  successfully 
use  a  model  of  the  machine  in  which  bits  do  in  fact  attain  only  the  values  0  and  1. 
Similarly,  slight  abstractions  of  practical  systems  that  do  guarantee  simultaneity  in 
many  cases  model  the  actual  system  very  well,  in  which  case  we  may  identify  the 
system  with  its  abstraction  for  all  practiced  purposes.  The  next  section  presents  a 
formal  argument  with  a  simileir  flavor. 
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3.4  Internal  knowledge  consistency 

Strictly  speaking,  common  knowledge  is  not  attainable  in  practical  systems  be¬ 
cause  such  systems  cannot  guarantee  to  perform  actions  at  different  sites  absolutely 
simultaneously.  However,  the  operations  performed  in  many  practical  distributed 
systems  do  not  require  events  at  different  sites  to  be  simultaneous.  Indeed,  it  is 
often  the  case  that  from  within  the  system  it  is  not  possible  to  determine  the  exact 
relative  (real  time)  difference  between  events  that  occur  at  different  sites.  We  say 
that  a  run  of  the  system  is  6-insensitive  if  given  the  information  available  to  the  pro¬ 
cessors,  the  (real)  time  difference  between  any  two  events  that  happen  at  different 
sites  of  the  system  can  be  determined  with  precision  no  better  than  6  (cf.  (HMM]). 
A  system  S  is  said  to  be  ^-insensitive  if  all  of  its  runs  are.  In  a  6-insensitive  system, 
it  is  not  possible  to  distinguish  from  within  the  S}  stem  between  the  clocks  being  per¬ 
fectly  synchronized  and  the  clocks  being  synchronized  to  within  6.  Thus,  it  seems 
that  when  the  clocks  in  such  a  system  are  sufficiently  synchronized,  treating  them 
as  if  they  were  perfectly  synchronized  should  have  no  noticeable  negative  effect  on 
the  system.  The  processors’  views  of  the  system  would  be  perfectly  consistent. 

We  define  an  epistemic  interpretation  Z  to  be  a  pseudo-knowledge  interpretation 
for  S  if  for  all  runs  r  £  S  there  is  a  run  r'  £  9  such  that  all  processors’  views  in 
r  are  identical  to  their  views  in  r',  and  Z  is  knowledge  consistent  with  r'.  (More 
formally,  for  all  r  £  S  there  is  a  run  r'  £  S  such  that  for  all  processors  pj  and 
times  t  >  t0(pj,r )  there  is  a  time  t1  >  t0(t  j,r')  such  that  v(p^,r,t)  =  v(pj,rr ,<').) 
Notice  that  a  knowledge  interpretation  for  5  is  in  particular  also  b  pseudo- knowledge 
interpretation  for  S.  The  converse  is  not  true.  However,  from  within  the  system  it 
is  impossible  to  determine  whether  or  not  a  pseudo  knowledge  interpretation  is  in 
fact  a  knowledge  interpretation.  Therefore,  basing  actions  on  “pseudo-knowledge” 
is  as  good  as  basing  them  on  true  knowledge. 

Our  discussion  above  can  be  formally  stated  els: 

Proposition  3.7:  Let  5  be  a  set  of  runs  such  that  in  every  run  r  £  S  clocks 
are  synchronized  to  within  the  insensitivity  of  r  (i.e.,  for  some  6 ,  r  is  6-insensitive 
and  clocks  are  synchronized  to  within  6  throughout  r).  Let  Z  be  an  epistemic 
interpretation  for  S.  Let  So  C  S  be  the  set  of  runs  of  S  in  which  all  clocks  are 
perfectly  synchronized.  If  Z  is  a  knowledge  interpretation  for  So,  then  Z  is  a  pseudo¬ 
knowledge  interpretation  for  S.  m 

Since  pseudo- knowledge  cannot  be  distinguished  from  “real”  knowledge,  Propo¬ 
sition  3.7  shows  us  that  if  clocks  are  synchronized  to  within  the  insensitivity  of 
the  system,  then  we  might  as  well  work  under  the  assumption  that  all  clocks  are 
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perfectly  synchronized.  Thus,  the  reasoning  is  essentially  carried  out  in  a  simpler 
abstract  model  of  the  system,  in  which  the  processors  have  a  global  clock.  The 
Proposition  shows  that  such  reasoning  will  be  internally  knowledge  consistent:  ev¬ 
ery  fact  that  a  processor  “knows”  will  be  consistent  with  the  processor’s  view  of  the 
system. 

Consider  the  children  hearing  their  father  announce  m.  As  we  have  argued  in 
Section  3.2,  they  do  not  truly  attain  common  knowledge  of  m.  However,  since  the 
uncertainty  regarding  when  they  comprehend  the  father’s  statement  is  very  small, 
to  the  extent  that  they  practically  cannot  tell  whether  or  not  they  in  fact  came  to 
know  m  simultaneously,  they  can  be  said  to  attain  pseudo-common  knowledge  of 
m.  Working  under  the  assumption  that  they  have  common  knowledge  of  this  fact 
will  never  fail  them.  This  explains  how  people  can  routinely  rely  on  the  assumption 
that  they  attain  common  knowledge  of  a  multitude  of  facts  without  suffering  any 
negative  consequences. 
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Lemma  2.1  and  the  proof  of  Proposition  3.4  show  that  the  state  of  common 
knowledge  is  closely  related  to  events  that  sire  guaranteed  to  occur  simultaneously 
at  different  sites  of  the  system.  Analyzing  the  attainability  of  common  knowledge 
in  various  systems  can  be  used  to  study  the  ability  to  perform  various  simultaneous 
coordinated  actions.  In  fact,  Chapter  6  uses  such  an  analysis  to  study  the  design  of 
fault-tolerant  protocols  for  simultaneous  actions  in  systems  of  unreliable  processors. 
However,  most  coordinated  actions  that  are  carried  out  in  a  distributed  system  are 
not  required  to  take  place  simultaneously.  This  chapter  considers  states  of  knowl¬ 
edge  related  to  common  knowledge  that  similarly  correspond  to  other  levels  of 
coordination  of  actions  in  a  distributed  system.  These  are  states  of  knowledge  that 
are  much  more  easily  attainable  than  common  knowledge  in  many  systems  of  inter¬ 
est.  Section  4.1  starts  out  by  reconsidering  the  state  of  common  knowledge  under 
state-based  interpretation,  providing  us  with  the  necessary  machinery  for  defining 
many  related  states  of  knowledge.  Section  4.2  studies  the  states  of  knowledge  that 
arise  from  broadcasting  a  message  using  synchronous  and  asynchronous  channels, 
and  relates  them  to  coordinated  actions  that  are  not  guaranteed  to  be  performed 
at  all  sites  simultaneously.  Section  4.3  considers  the  effect  of  defining  knowledge 
relative  to  a  relativistic  notion  of  time,  and  compares  the  states  of  knowledge  aris¬ 
ing  there  with  those  that  are  defined  relative  to  absolute  time.  Finally,  Section  4.4 
looks  at  the  states  of  knowledge  arising  when  actions  are  only  likely  to  be  coordi¬ 
nated,  capturing  among  other  things  the  state  of  knowledge  that  actually  arises  in 
the  coordinated  attack  problem. 
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4.1  Common  knowledge  revisited 

Throughout  this  chapter  we  will  restrict  our  attention  to  state-based  interpre¬ 
tations  of  knowledge,  since  they  seem  to  be  the  most  appropriate  for  the  kinds  of 
applications  we  will  be  interested  in.  It  is  useful  to  start  by  reconsidering  the  notion 
of  common  knowledge  under  such  interpretations,  from  a  slightly  different  point  of 
view.  Recall  the  children’s  state  of  knowledge  of  the  fact  m  in  the  dirty  children 
puzzle.  If  we  assume  that  it  is  common  knowledge  that  all  children  comprehend  m 
simultaneously,  then  after  the  father  announces  m,  the  children  attain  Cm.  How¬ 
ever,  when  they  attain  Cm  it  is  not  the  case  that  the  children  learn  an  infinite 
collection  of  facts  of  the  form  En m  separately.  Rather,  after  the  father  speaks,  the 
children  are  in  a  state  of  knowledge  $  characterized  by  the  fact  that  ^  implies  that 
m  holds  and  that  every  child  knows  that  $  holds.  Thus,  satisfies  the  equation 


'P  =  m  A  EVl. 


The  following  Theorem  shows  that  this  phenomenon  arises  in  quite  the  same 
way  in  state-based  knowledge  interpretations: 

Theorem  4.1;  Let  Za  be  a  state-based  interpretation  for  5.  Then  for  all  runs 
r  €  S  and  times  t  it  is  the  case  that 

(T^,r,<)  C0y>  =  <pA  EaCo<P- 

Proof;  The  formulas  CG<p  D  <p  and  ECa<p  D  Ca<p  are  clearly  valid  for  all  knowl¬ 
edge  interpretations.  It  remains  to  show  that  Ca<p  D  EaCaip  is  valid.  Since 
is  a  state-based  knowledge  interpretation,  we  have  from  Section  2.3  that  <p  is  com¬ 
mon  knowledge  at  a  given  point  (r,  t)  iff  <p  holds  in  all  points  that  are  reachable 
from  (r,t).  If  a(t,r,t)  =  cr(i,r' ,t')  for  some  pi  €  G  then  (r',t')  is  clearly  reachable 
from  (r,  t).  Furthermore,  because  reachability  is  an  equivalence  relation  in  our  case, 
any  point  reachable  from  (r,  <)  is  clearly  also  reachable  from  (r1  ,t').  Since  we  have 
from  Section  2.3  that  (Ia,r,t)  (=  Earf>  iff  (Iv,r',t')  |=  t/>  for  all  (r',t')  such  that 
<r(t,  r,t)  =  a(i,r' ,t')  for  some  p;  €  G,  the  result  directly  follows.  cx 

Thus,  under  a  state-based  interpretation,  CGy>  is  a  “fixpoint”  of  the  Ea  operator, 
in  which  ip  holds.  An  equivalent  definition  for  CG<p  in  the  case  of  a  state-based 
interpretation  Xa  is  as  the  weakest  solution  for  X  in  the  equation 


X  =  <p  A  EaX, 
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by  which  we  mean  that  any  solution  for  X  in  this  equation  implies  Caip.  (Notice 
that  the  above  equation  may  have  many  solutions.  For  example,  both  false  and 
CG(ifi  A  t/>)  solve  it.)  Ca<p  is  what  is  called  the  greatest  fizpoint  of  this  equation. 
As  our  discussion  of  common  knowledge  in  the  case  of  the  muddy  children  puzzle 
suggests,  expressing  common  knowledge  as  a  greatest  fixpoint  of  such  an  equation 
seems  to  correspond  more  closely  to  the  way  it  actually  arises.  While  it  is  beyond 
the  scope  of  this  thesis  to  give  a  detailed  formal  semantic  definition  of  a  logic  with 
fixpoint  definitions,  we  sketch  a  semantics  for  a  propositional  state-based  logic  of 
knowledge  with  fixpoints  in  Appendix  B. 

In  the  case  of  state-based  knowledge  interpretations,  the  following  axioms  are 
valid  for  common  knowledge  (for  logical  systems  with  these  and  other  axioms,  see 
[Mi],  [Le],  [HM],  and  the  analogous  axioms  for  the  PDL  “*”  operator  in  [KP]): 

(1)  The  “fixpoint”  axiom: 

CoV  D  ¥>  A  EoCaip. 

(2)  The  “induction”  axiom: 

CG(v  D  Eg<p)  D  (<p  D  Ca<p). 

(3)  The  “consequence  closure”  axiom: 

(  a  Cg&  Z)  VO)  Calf). 

The  induction  axiom  states  that  if  it  is  common  knowledge  that  whenever  <p 
holds  everybody  knows  y>,  then  when  y>  holds,  <p  is  common  knowledge.  It  is  called 
an  induction  axiom  because  from  the  antecedent  C(<p  D  Ep)  we  can  prove  by 
induction  that  <p  D  En<p  holds  for  all  n.  Roughly  speaking,  it  traces  our  line  of 
reasoning  when  we  argued  that  the  children  in  the  muddy  children  puzzle  attain 
common  knowledge  of  the  father’s  statement. 

Another  interesting  property  of  common  knowledge  under  state-based  interpre¬ 
tations  is: 

-'CGtp  D  Cg-1 'CG<p. 

Notice  that  it  is  possible  to  deduce  this  fact  from  Lemma  2.1  using  the  induction 
axiom.  Axioms  (l)-(3)  completely  characterize  common  knowledge  in  the  logical 
systems  of  [Le],  [HM]. 


SECT.  4.2 


e-COMMON  KNOWLEDGE  AND  0-COMMON  KNOWLEDGE  33 


4.2  e-common  knowledge  and  O-common  knowledge 

Since  attaining  common  knowledge  in  practical  distributed  systems  is  problem¬ 
atic,  it  is  natural  to  ask  what  states  of  knowledge  can  be  obtained  by  the  communi¬ 
cation  process.  In  this  section  we  consider  what  states  of  knowledge  are  attained  in 
systems  in  which  communication  delivery  is  guaranteed  but  message  transmission 
times  are  uncertain.  However,  before  we  can  do  so,  we  need  to  extend  our  language 
to  allow  reasoning  about  time.  We  introduce  temporal  operators  to  the  language 
by  adding  the  following  clause  to  the  inductive  definition  of  formulas  in  the  lan¬ 
guage  (cf.  Section  2.2):  If  <p  is  a  formula  and  6  is  a  real  number,  then  Ov5  and  OV 
are  formulas.  Roughly  speaking,  stands  for  “eventually  <pn ,  while  O 6<P  stands 
for  “6  time  units  from  now  ip”.  We  then  extend  our  semantic  definitions  so  that 
(J,r,t)  f=  Q<p  iff  for  some  t'  >  t  it  is  the  case  that  (Z,r,t')  |=  <p;  and  (J,r,<)  |=  OV 
iff  (Z,  r,  t  4-  S)  ( =  ip.  (This  definition  corresponds  to  linear  time  semantics  for  tem¬ 
poral  logic;  cf.  [MP].)  It  is  customary  to  define  □  <p  (read  “always  <p”)  as  the  dual  of 
0<p,  i.e.,  Op  =f  ->Q-«p. 

In  dealing  with  distributed  systems  in  which  communication  is  not  instanta¬ 
neous,  we  are  not  so  interested  in  facts  whose  truth  value  might  change  between  the 
time  a  message  regarding  them  is  sent  and  when  it  is  received.  A  fact  <p  is  called 
stable  if  it  has  the  property  that  once  true  it  remains  true  forever.  More  formally, 
ip  is  stable  if  <p  D  □  ip  is  valid.  Notice  that  given  any  fact  0,  the  following  facts  are 
stable:  “0  held  at  some  point  in  the  past”,  “0  holds  and  will  hold  throughout  the 
future”,  “0  holds  at  time  t  on  pi’s  clock”,  and  “0  holds  throughout  time  interval 
A”.  (For  the  relevance  of  stable  facts  to  distributed  systems,  see  also  [CL].)  More¬ 
over,  if  0  is  stable  then  so  are  00  and  0*0.  However,  it  is  not  in  general  the  case 
that  knowledge  of  stable  facts  is  stable.  For  example,  given  a  general  state-based 
knowledge  interpretation  Ta  and  a  stable  fact  0,  it  is  possible  ( J,  r,  t)  f=  K,rp  and 
(Z,r,t')  Kit}}  for  some  t'  >  t.  This  does  not  happen  in  the  total  view  interpre¬ 
tation,  since  in  this  case  a  fact  is  stable  iff  all  processors  know  that  it  is  stable, 
and  processors  do  not  forget  what  facts  they  knew.  Thus,  in  particular,  they  don’t 
forget  stable  facts.  Under  the  total  view  interpretation,  if  <p  is  stable,  then  so  are 
Kpp,  E<p ,  C<p,  etc.  For  the  remainder  of  this  section  we  will  restrict  our  attention 
to  the  total  view  interpretation. 

We  begin  by  considering  synchronous  broadcast  channels  of  communication:  ev¬ 
ery  message  sent  is  delivered  to  all  processors  in  the  system  (including  the  sender), 
and  processors  receive  the  message  up  to  e  units  of  real  time  apart,  e  is  called  the 
broadcast  spread  of  such  a  channel.  Recall  that  the  properties  of  the  system  are 
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common  knowledge  to  all  the  processors  in  the  system.  In  particular,  the  proper¬ 
ties  of  the  broadcast  channel  are  common  knowledge.  Let  us  now  consider  the  state 
of  knowledge  of  the  system  (under  the  total  view  interpretation)  when  a  processor 
pi  receives  a  broadcast  message  m;  pi  knows  sent(m),  but  he  knows  more:  pi  also 
knows  that  within  e  time  units  everyone  will  (receive  m  and)  know  seni(m).  But  he 
knows  even  more:  pi  also  knows  that  within  e  everyone  will  know  sent(m)  and  they 
will  all  know  that  within  another  e  everyone  will  know  seni(m).  This  argument  can 
be  continued,  and  it  leads  us  to  the  notion  of  e-common  knowledge ,  denoted  Cf. 

C(<p  is  defined  to  be  the  greatest  fixpoint  of  the  equation: 

X  =  <p  A  OfEX. 

Again,  we  refer  the  reader  to  Appendix  B  for  a  rigorous  definition.  However,  as  the 
above  discussion  suggests,  CV  can  also  be  equivalently  expressed  by  the  following 
infinite  conjunction: 


CV  =  p  A  <JEp  A  Q* E(£y Ep)  A  •  •  •  A  (0*-®)™P  A  •  •  • . 


For  any  message  m  delivered  in  an  e-spread  synchronous  broadcast  channel, 
senium)  becomes  e-common  knowledge  as  soon  as  m  is  delivered  io  any  processor. 
Returning  to  R2  and  D2’s  communication  problem,  we  can  view  them  as  a  syn¬ 
chronous  broadcast  system,  and  indeed  they  attain  C*seni(m)  immediately  when 
R2  sends  the  message  m.  The  interested  reader  is  invited  to  check  that  R2  and  D2 
in  fact  achieve  e/2-common  knowledge  of  sent(m)  at  time  ts  4-  e/2. 

In  a  system  in  which  all  clocks  run  at  the  rate  of  real  time,  if  a  processor  knows 
that  <p  will  hold  “tomorrow”,  for  an  arbitrary  fact  p>,  then  (under  the  total  view 
interpretation)  “tomorrow”  the  processor  knows  that  ip  holds,  i.e.,  the  processors’ 
knowledge  satisfies  the  axiom  K{  0  3  Q)Knp.  (This  is  also  the  case  in  the 

logical  systems  of  [Sa]  and  [Le].)  Similar  statements  hold  for  Qf  and  E.  In  this 
case  (OfE)kip  D  O k(Ekip,  and  it  follows  that  Ct<p  D  Q)k(Ek<p.  So,  if  CV  holds  in 
such  a  system,  then  Ekp  will  eventually  hold  (within  ke  time  units,  to  be  precise). 

It  is  now  interesting  to  compare  common  knowledge  and  e-common  knowledge. 
Whereas  Cp  is  a  static  state  of  knowledge,  which  can  be  true  of  a  point  in  time  irre¬ 
spective  of  its  past  or  future,  CV  is  a  notion  that  is  essentially  temporal.  Whether 
or  not  it  holds  depends  on  what  processors  will  know  within  e,  within  2e,  etc.  In 
fact,  since  C*<p  D  y>  and  CV  D  O'C'tp,  if  CV  holds,  then  ip  (as  well  as  CV) 
must  hold  ne  time  units  from  now,  for  all  n  >  0.  Note  that  if  ip  is  a  stable  fact 
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then  CV  D  CV>  85  we  have  shown,  there  are  cases  where  a  stable  fact  becomes 
e- common  knowledge  and  cannot  become  common  knowledge  (e.g.,  R2  and  D2  at¬ 
tain  Ce3ent(m),  and  cannot  attain  C(sent(m))).  Thus,  for  stable  facts,  common 
knowledge  is  strictly  stronger  than  e-common  knowledge  The  axioms  (l)-(3)  of 
Section  4.1  remain  true  of  C*<p,  when  we  replace  E  by  QfE  and  C  by  C€.  Further¬ 
more,  both  C  and  Ce  are  conjunctive ,  i.e.,  the  formulas  C<p  A  Cxp  D  C(<p  A  ip) 
and  CV  A  Cttp  D  C*(<p  A  tp)  are  valid.  (Thus,  in  particular,  the  uncertainty 
about  information  sent  in  separate  messages  in  a  synchronous  broadcast  channel 
need  not  be  any  worse  than  the  broadcast  spread.)  However,  whereas  the  formula 
~'C<p  D  C-iCip  is  valid  for  common  knowledge,  its  analogue,  -»CV  3  Ce-'Ctip  is 
not  valid. 

Just  as  common  knowledge  is  closely  related  to  simultaneous  actions  in  a  dis¬ 
tributed  system,  e-common  knowledge  is  closely  related  to  actions  that  are  guaran¬ 
teed  to  be  performed  within  e  time  units  of  one  another.  E.g.,  in  an  “early  stopping” 
protocol  for  Byzantine  agreement  (cf.  [DRS]),  all  members  of  G  are  guaranteed  to 
decide  on  a  common  value  within  e  time  units  of  each  other.  It  follows  that  once 
the  first  processor  decides,  the  decision  value  is  e-common  knowledge  in  G. 

Our  discussion  of  knowledge  in  a  distributed  system  is  motivated  by  the  fact 
that  we  can  view  processors’  actions  as  being  based  on  their  knowledge.  Con¬ 
sider  an  “eager”  epistemic  interpretation  X  under  which  a  processor  that  receives 
an  e-broadcast  message  m  immediately  supports  C(sent(m)).  Clearly,  X  is  not  a 
knowledge  interpretation,  because  it  is  not  knowledge  consistent  (a  processor  might 
be  said  to  “know”  that  another  processor  knows  sent(m),  when  in  fact  the  other 
processor  does  not!).  However,  once  the  last  processor  receives  m,  which  happens 
at  most  e  time  units  after  the  first  processor  starts  supporting  C(sent(m)),  it  is  easy 
to  see  that  C(sen<(m))  does  indeed  hold!  In  a  sense,  Lemma  2.1  says  that  attaining 
common  knowledge  requires  a  certain  kind  of  “natural  birth”;  it  is  not  possible  to 
attain  it  consistently  unless  simultaneity  is  attainable.  But  if  one  is  willing  to  give 
up  knowledge  consistency  (i.e.,  abandon  the  Kup  D  tp  axiom)  for  short  intervals  of 
time,  something  very  similar  to  common  knowledge  can  be  attained. 

The  period  of  up  to  e  time  units  in  which  the  processors’  “knowledge”  is  incon¬ 
sistent  might  have  many  negative  consequences.  If  the  processors  need  to  act  based 
on  whether  C(sent(m ))  holds  during  that  interval,  they  might  not  act  in  an  appro¬ 
priately  coordinated  way.  This  is  a  familiar  problem  in  the  context  of  distributed 
database  systems.  Committing  to  a  transaction  roughly  corresponds  to  joining  an 
agreement  that  the  transaction  has  taken  place  in  the  database.  There,  it  is  the 
case  that  different  sites  of  the  database  commit  to  transactions  at  different  times 
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(although  all  within  a  small  time  interval).  When  a  new  transaction  is  being  com¬ 
mitted  to  there  is  a  “window  of  vulnerability”  during  which  different  sites  might 
project  inconsistent  views  of  the  database.  However,  once  all  sites  commit  to  the 
transaction,  the  view  of  the  database  that  the  sites  project  becomes  consistent  (at 
least  until  the  next  transaction). 

Having  discussed  states  of  knowledge  in  synchronous  broadcast  channels,  we 
now  turn  our  attention  to  systems  in  which  communication  is  asynchronous:  no 
bound  on  the  delivery  times  of  messages  in  the  system  exists.  Consider  the  state 
of  knowledge  of  senium)  in  a  system  in  which  m  is  broadcast  over  an  asynchronous 
channel:  A  channel  that  guarantees  that  every  message  broadcast  will  eventually 
reach  every  processor.  Upon  receiving  m,  a  processor  knows  sen((m),  and  knows 
that  eventually  every  other  processor  will  receive  m  and  know  sen<(m),  and  that 
eventually  every  other  processor  will  receive  m  and  .... 

This  state  of  knowledge,  where  it  is  common  knowledge  that  if  m  is  sent  then 
everyone  will  eventually  know  that  m  has  been  sent,  gives  rise  to  a  weak  state  of 
group  knowledge  which  we  call  eventual  common  knowledge. 

Recall  that  the  temporal  logic  symbol  0  stands  for  “eventually”.  We  denote 
QKup  by  Kftp  and  define  E°p  =  /\  Kfip.  (Note  that  in  the  total  view  interpreta- 

tion,  where  processors  do  not  forget  stable  facts,  if  <p  is  stable  then  Ecp  =  QE<p.) 
0- common  knowledge  (read  eventual  common  knowledge ),  denoted  by  C°,  is  defined 
as  (the  greatest  fixpoint  of): 


C\  =  <p  A  £*<:  v 

The  axioms  (l)-(3)  from  section  4.1  are  also  valid  for  C°<p  (once  we  replace  E  by 
E°  and  C  by  C°).  As  with  C%  the  formula  -^C^ip  3  C^-iC^V  is  not  valid.  Note 
that  E0<p  does  not  imply  E°tp,  so  the  fact  that  C°<p  holds  does  not  imply  that  E2p> 
will  ever  hold. 

Given  our  experience  with  C  and  C(,  it  would  be  natural  to  conjecture  that 
C°<p  is  equivalent  to  <p  A  §Ep  A  QEQE<p  A  •  •  •.  However,  this  infinite  conjunction 
is  strictly  weaker  than  C°p\  The  reason  for  that  is  that  0  and  0*  do  not  interact 
with  infinite  conjunctions  in  the  same  way.  I.e.,  if  each  of  an  infinite  number  of  facts 
are  guaranteed  to  hold  e  time  units  from  now,  then  their  conjunction  will  also  hold. 
However,  if  an  infinite  number  of  facts  are  each  guaranteed  to  hold  eventually ,  then 
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it  is  not  necessarily  the  case  that  at  any  given  time  in  the  future  they  will  all  hold 
simultaneously. 1 

An  easy  consequence  of  the  definition  of  C°ip  is  that  if  a  processor  knows 
then  CV  holds  and  all  the  processors  eventually  know  it.  As  common  knowledge 
corresponds  to  simultaneous  events  and  e- common  knowledge  to  events  that  occur 
within  e  of  each  other,  ^-common  knowledge  corresponds  to  events  that  are  guaran¬ 
teed  to  happen  at  all  sites  eventually.  For  example,  in  some  of  the  work  on  variants 
of  the  Byzantine  Agreement  problem  discussed  in  the  literature  (cf.  [DDS]),  the 
kind  of  agreement  sought  is  one  in  which  whenever  a  correct  processor  decides  on  a 
given  value,  each  other  correct  processor  is  guaranteed  to  eventually  decide  on  the 
same  value.  The  state  of  knowledge  of  the  decision  value  that  the  processors  can 
be  said  to  attain  in  such  circumstances  is  precisely  O-common  knowledge.  Also,  in 
asynchronous  broadcast  channels,  sent(m)  is  O-common  knowledge  at  the  instant 
m  is  sent. 

C°  is  the  weakest  temporal  notion  of  common  knowledge  that  we  have  intro¬ 
duced.  In  fact,  we  now  have  a  hierarchy  of  the  temporal  notions  of  common  knowl¬ 
edge.  For  a  stable  fact  ip  and  t\  <•••<«»<  en+i  <  •  •  • ,  we  have: 

CV  D  Ctx<p  D  •••  D  C(n<p  D  Ctn+l<p  D  D  C°<p. 


Having  defined  Cf  and  C°,  it  is  interesting  to  ask  how  these  states  of  knowl¬ 
edge  are  affected  by  communication  not  being  guaranteed.  Recall  that  Lemma  3.1 
and  Theorem  3.2  imply  that  if  communication  is  not  guaranteed,  then  common 
knowledge  is  independent  of  the  communication  process:  communication  does  not 
affect  what  facts  are  common  knowledge  and  when  facts  become  common  knowl¬ 
edge.  Interestingly,  an  analogue  of  Lemma  3.1  does  not  hold  for  C(  and  C°.  It 
is  not,  in  general,  the  case  that  C'tp  is  undetermined  at  (r,t)  iff  in  the  absence  of 
any  further  communication  C'lft  will  never  hold  (the  same  applies  to  For 

example,  consider  a  system  consisting  of  R2  and  D2  connected  by  a  two-way  link. 
Communication  along  the  link  is  not  guaranteed,  R2  and  D2’s  clocks  are  perfectly 
synchronized,  and  each  one  of  them  follows  the  following  protocol:  At  time  0,  send 
the  message  “OK”.  For  all  k  >  0,  if  you  have  received  k  “OK”  messages  by  time 
k  on  your  clock,  send  an  “OK”  message  at  time  k;  otherwise,  send  nothing.  Let 

1  However,  C0<p  is  equivalent  to  — a  different  infinite  conjunction  of  formulas.  Define 

*0  as  < p  and  $n+i  =  $n  A  Then  it  is  possible  to  show  that  =  [\$n  (cf.  the 

n 

discussion  in  Appendix  B). 
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rp  =“some  message  was  not  delivered  within  one  time  unit”.  Fix  e  =  1.  Notice  that 
C'rp  is  undetermined  at  time  0,  since  in  the  fortunate  event  that  all  messages  are 
delivered  within  one  time  unit,  rp  will  never  hold,  and  similarly  Cfrp  will  never  hold. 
However,  if  at  any  point  rp  does  hold,  then  so  does  Ctxp.  E.g.,  if  R2  fails  to  receive 
an  “OK”  message  between  time  k  —  1  and  time  k ,  then  R2  knows  t p  at  time  k.  R2 
therefore  does  not  send  an  “OK”  message  at  time  k,  and  it  follows  that  D2  knows 
ip  at  time  k  4-  1  at  the  latest.  And  because  rp  D  CfErp  is  guaranteed,  it  is  easy  to 
see  that  by  the  induction  axiom  rp  D  Ccxp.  (Since  Ctq  D  C°rp,  the  same  example 
shows  the  claim  for  C°rp.) 

Note  that  in  the  above  example  successful  communication  in  a  system  with 
unguaranteed  communication  helped  to  prevent  C*rp  (resp.  C*rp)  from  holding.  But 
can  successful  communication  in  such  a  system  contribute  to  C£<p  or  C°<p  coming 
to  hold?  A  partial  analogue  to  Theorem  3.2  shows  that  this  is  not  possible: 

Theorem  4.2:  Let  rp  be  a  stable  fact,  let  S'  be  a  system  in  which  communication 
is  not  guaranteed,  and  let  r,,r_  €  S  be  runs  such  that  r~  extends  and  no 

messages  are  delivered  in  r~  at  or  after  If  (r-,  t)  (=  -<Ctrp  (resp.  (r~ ,  t )  |=  -> C’ti) 
for  all  t  >  <t,  then  (rt,<)  (=  -> C*rp  (resp.  (rx,<)  |=  ~'C<>rp)  for  all  t  >  tx. 

Proof:  We  sketch  the  proof  for  C°rp.  The  proof  for  Cftp  is  analogous.  We  prove 
by  induction  on  n  that  there  is  no  run  r  €  5  that  extends  (r, ,  1 1 )  in  which  C°rp  holds 
at  a  time  t  >  and  exactly  n  messages  are  delivered  to  their  destinations  up  to  (but 
not  including)  the  time  the  first  processor  knows  C°rp.  The  case  n  =  0  follows  from 
our  assumption,  since  as  long  as  no  messages  are  delivered  all  processors’  knowledge 
is  the  same  as  in  r-,  and  in  r“  no  processor  ever  knows  C^rp.  Suppose  we  have 
proved  the  claim  for  n  <  k,  and  assume  that  r  extends  (rx,<x)  and  attains  C°t/> 
using  k  +  1  messages.  Let  t  be  the  real  time  at  which  the  first  processor  (or  one  of 
them,  in  case  of  a  tie),  say  p,-,  comes  to  know  C^rp  in  r.  If  no  message  was  delivered 
to  pi  before  time  t  in  r,  then  p,-  knows  C°rp  at  (r,<)  iff  pi  knows  C*rp  at  (r~,  f),  a 
contradiction.  Let  t*  <  t  be  the  latest  time  before  t  in  which  pi  receives  a  message 
in  r,  and  let  m  be  one  of  the  messages  pi  receives  at  t1 .  Let  r'  be  a  run  that  extends 
(r,  t1)  in  which  no  messages  are  delivered  after  time  t1,  and  all  messages  delivered 
at  (r,  t1)  are  delivered  at  ( r’,t ')  (such  a  run  exists  by  (*)).  Since  p*’s  view  at  (r,  t) 
and  at  (r1  ,t)  are  the  same,  pi  knows  C°rp  in  (r',f).  Thus,  in  particular,  if  pj  pi  is 
another  processor,  then  for  some  t"  >  t  it  is  the  case  that  pj  knows  C°rp  at  (r',t"). 
Let  r"  be  a  run  that  is  identical  to  r'  until  (and  including)  time  t’ ,  except  that  pi 
does  not  receive  the  message  m  at  (r”,  <'),  and  in  which  no  message  is  delivered 
after  t1 .  Since  pj’s  view  at  (r',t")  and  at  ( r" ,t ")  are  identical,  pj  knows  C°rp  at 
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(r",<").  But  at  most  k  messages  are  delivered  in  r"  before  t",  contradicting  the 
induction  hypothesis.  ixi 

Theorem  4.2  shows  that  communication  cannot  be  used  in  order  to  attain  C°i/> 
or  Cftp  when  these  states  of  knowledge  are  not  guaranteed  to  hold  in  the  absence  of 
communication.  Thus  unreliable  communication  cannot  be  used  for  planning  and 
carrying  out  coordinated  actions  in  a  way  that  guarantees  the  participation  of  sill 
sites.  This  allows  us  to  prove  Corollary  4.3,  which  characterizes  the  graveness  of 
the  generals’  problem  in  the  coordinated  attack  example: 

Corollary  4.3:  In  the  coordinated  attack  problem,  any  protocol  that  guarantees 
that  if  either  party  attacks  then  the  other  party  will  eventually  attack,  is  a  protocol 
in  which  necessarily  neither  party  attacks. 

Proof:  The  proof  is  analogous  to  that  of  Proposition  3.4.  Assume  that  (P,  P')  is  a 
joint  protocol  that  guarantees  that  if  either  party  attacks  then  they  both  eventually 
attack.  Let  5  and  tt  be  as  in  the  proof  of  Proposition  3.4.  Let  tp  =  “general  A 
either  has  started  attacking  or  will  eventually  attack,  and  general  B  either  has 
started  attacking  or  will  eventually  attack”.  By  the  problem  description,  C°t/>  is 
undetermined  at  (r,*,),  for  all  r  6  S.  Because  of  the  properties  of  (P,  P'),  it  is 
clear  that  an  attacking  general  knows  C°tp  (under  the  total  view  interpretation  as 
well  as  under  the  interpretation  X„  used  in  the  proof  of  proposition  3.4).  Thus,  by 
Theorem  4.2,  the  protocol  (P,P')  guarantees  that  neither  general  will  ever  attack! 
XI 

Recall  that  the  proofs  that  unreliable  communication  cannot  affect  what  facts 
are  common  knowledge  carried  over  to  (reliable)  asynchronous  communication.  Our 
proof  in  Theorem  4.2  clearly  does  not  carry  over.  In  fact,  a  message  broadcast  over  a 
reliable  asynchronous  channel  does  become  eventual  common  knowledge.  However, 
it  is  possible  to  show  that  asynchronous  channels  cannot  be  used  in  order  to  attain 
e-common  knowledge: 
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Theorem  4.4:  Let  ip  be  a  stable  fact,  let  5  be  a  system  with  unbounded  delivery 
times,  let  t  >  t,  and  let  rt,r-  €  S  be  runs  such  that  r~  extends  (rt,<t)  and 
no  messages  are  delivered  in  r~  in  the  interval  [$,,<  +  e).  If  ( r~,t )  )/=  C(rp  then 
(r,,t)  ^  C*ip. 

Sketch  of  Proof:  The  proof  essentially  follows  the  proof  of  Theorem  4.2,  except 
that  the  delivery  of  messages  in  the  runs  r'  and  r"  constructed  in  the  course  of 
the  proof  are  delayed  until  after  time  t  +  e,  rather  than  not  being  delivered  at  all. 
Details  are  left  to  the  reader.  txa 

Thus,  asynchronous  communication  channels  are  of  no  use  for  coordinating  ac¬ 
tions  that  are  guaranteed  to  be  performed  at  all  sites  within  a  predetermined  fixed 
time  bound. 

4.3  Time  stamping:  using  relativistic  time 

Real  time  is  not  always  the  appropriate  notion  of  time  to  consider  in  a  distributed 
system.  Processors  in  a  distributed  system  often  do  not  have  access  to  a  common 
source  of  real  time,  and  their  clocks  do  not  show  identical  readings  at  any  given  real 
time.  Furthermore,  the  actions  taken  by  the  processors  rarely  actually  depend  on 
real  time.  Rather,  time  is  often  used  mainly  for  correctly  sequencing  events  at  the 
different  sites  and  for  maintaining  a  “consistent”  view  of  the  state  of  the  system.  In 
this  section  we  consider  states  of  knowledge  relative  to  relativistic  notions  of  time. 

Consider  the  following  scenario:  R2  knows  that  R2  and  D2’s  clock  differ  by  at 
most  6 ,  and  that  any  message  R2  sends  D2  will  arrive  within  e  time  units.  R2  sends 
D2  the  following  message  m': 

“This  message  is  being  sent  at  ts  on  R2’s  clock,  and  will  reach  D2  by  <5  +  6+6 

on  both  clocks;  m.” 

Let  us  denote  ts  +  e  +  6  by  To.  Now,  at  time  To  on  his  clock,  R2  would  like 
to  claim  that  senium')  is  common  knowledge.  Is  it?  Well,  we  know  by  now  that 
it  is  not,  but  it  is  interesting  to  analyze  this  situation.  First,  let  us  introduce  a 
relativistic  formalism  for  knowledge,  which  we  call  time-stamped  knowledge:  We 
denote  “at  time  T  on  his  clock,  p,-  knows  <p”  by  Kf<p.  T  is  said  to  be  the  time 
stamp  associated  with  this  knowledge.  We  then  define 

El?  s  /\  KJ?. 

«Go 

ETip  corresponds  to  everyone  knowing  <p  individually  at  time  T  on  their  own  clocks. 
Notice  that  for  T0  as  above,  sentfjn')  D  ET°sent(m').  It  is  natural  to  define  the 
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corresponding  relativistic  variant  of  common  knowledge,  CT,  which  we  call  time- 
stamped  common  knowledge: 


CTv?  =  y>A£TCV 


So,  once  m'  is  sent,  R2  and  D2  have  time-stamped  common  knowledge  of  sent(m') 
with  time  stamp  To.  It  is  easy  to  check  that  CT  satisfies  axioms  (l)-(3)  of  Sec¬ 
tion  4.1.  Interestingly,  the  formula  ->CT<p  D  CT~'CT<p  is  valid.  In  this  respect,  CT 
resembles  C  more  than  Cf  and  C°  do. 

It  is  interesting  to  investigate  how  the  relativistic  notion  of  time-stamped  com¬ 
mon  knowledge  relates  to  the  notions  of  common  knowledge,  e-common  knowledge, 
and  0- common  knowledge.  Not  surprisingly,  the  relative  behavior  of  the  clocks  in 
the  system  plays  a  crucial  role  in  determining  the  meaning  of  CT. 

Theorem  4.5:  Under  the  total  view  interpretation, 

(a)  If  it  is  common  knowledge  that  all  clocks  show  identical  times,  then  at  T  on  any 
processor’s  clock,  CTp  =  Cy?. 

(b)  If  y>  is  a  stable  fact  and  it  is  e-common  knowledge  that  all  clocks  are  within  e 
time  units  of  each  other,  then  at  T  on  any  processor’s  clock,  CTp  D  Ce<p. 

(c)  If  y;  is  a  stable  fact  and  it  is  0-common  knowledge  that  all  clocks  read  T  at 

some  time,  then  at  time  T  on  any  processor’s  clock,  CTp  D  C°<p.  M 

Theorem  4.5  demonstrates  the  conditions  that  allow  interchanging  the  relativis¬ 
tic  CT  with  C,  C*,  and  C°.  Note  that  a  weak  converse  of  Theorem  4.5  holds  as 
well.  Suppose  we  allowed  the  processors  to  set  their  clocks  to  a  common  agreed 
upon  time  T  when  they  come  to  know  C<p  (resp.  come  to  know  Ce<p,  C°y?).  Then 
it  is  easy  to  see  that  whenever  Cy>  (resp.  C*<p,  C°y?)  is  attainable,  so  is  CTy?. 

In  many  distributed  systems  time-stamped  common  knowledge  seems  to  be  a 
more  appropriate  notion  to  reason  about  than  “true”  common  knowledge.  Although 
common  knowledge  cannot  be  attained  in  practical  systems,  time-stamped  common 
knowledge  is  attainable  in  many  cases  of  interest  and  seems  to  correspond  closely  to 
the  relevant  phenomena  that  protocol  designers  are  confronted  with.  For  example, 
in  distributed  protocols  that  work  in  phases,  we  speak  of  the  state  of  the  system  at 
the  beginning  of  phase  2,  at  the  end  of  phase  fc,  and  so  on.  It  is  natural  to  think  of 
the  phase  number  as  a  “clock”  reading,  and  consider  knowledge  about  what  holds 
in  the  different  phases  as  “time-stamped”  knowledge,  with  the  phase  number  being 
the  time  stamp.  In  certain  protocols  for  Byzantine  agreement,  for  example,  the 
nonfaulty  processors  attain  common  knowledge  of  the  decision  value  at  the  end  of 
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phase  k  (see  Chapter  6  for  details  on  the  relationship  between  knowledge  and  the 
Byzantine  agreement  problem).  In  practical  systems  in  which  the  phases  do  not  end 
simultaneously  at  the  different  sites  of  the  system,  the  processors  actually  attain 
time-stamped  common  knowledge  of  the  decision  value,  with  the  time  stamp  being 
“the  end  of  phase  k" . 

4.4  Likely  common  knowledge  and  other  variants 

The  oroperties  of  the  communication  channels  in  a  system  play  an  important  role 
in  determining  the  type  or  quality  of  the  information  that  can  be  obtained  by  the 
processors  in  a  system.  In  particular,  we  have  seen  that  common  knowledge  cannot 
be  attained  in  many  systems  of  practical  interest.  Sections  4.2  and  4.3  dealt  mainly 
with  the  states  of  knowledge  thac  arise  when  there  is  uncertainty  regarding  when 
messages  are  delivered.  In  systems  in  which  communication  is  not  guaranteed,  such 
as  in  the  case  of  the  coordinated  attack  problem,  it  is  uncertain  whether  messages 
are  delivered  at  all.  However,  it  is  often  the  case  that  successful  communication  is 
highly  likely  in  that  every  message  broadcast  is  highly  likely  to  be  delivered  to  all 
the  processors. 

First,  let  us  consider  systems  ir  which  message  delivery,  when  successful,  is 
immediate.  The  fact  a  message  broadcast  in  such  a  system  is  likely  to  be  de¬ 
livered  to  all  processors  can  be  formally  captured  by  the  formula  C(sent(m )  D 
“Likely” E(aent(m))).  (The  notion  of  “likely”  here  is  essentially  that  of  [HR].)  The¬ 
orem  4.2  implies  that  a  message  broadcast  in  such  a  system  never  becomes  even 
eventual  common  knowledge.  The  notion  of  common  knowledge  that  the  proces¬ 
sors  in  such  a  system  attain  when  a  message  is  broadcast  is  called  likely  common 
knowledge,  denoted  CL.  If  we  denote  “Likely  Eip"  by  EL<p,  then  CLp  is  defined  by: 

CL<p  =  p  A  ELCLp. 

By  definition,  CL  satisfies  the  fixpoint  axiom  (1)  of  Section  4.1.  If  we  denote  “Likely 
ip"  by  Lip ,  it  is  not  in  general  the  case  that  likelihood  satisfies  a  consequence  closurd 
axiom.  Namely,  it  is  not  necessarily  the  case  that  if  both  Lip  and  L(<p  D  4>)  hold, 
then  Lxp  holds  (cf.  [HR]).  As  a  result,  CL  satisfies  neither  the  induction  axiom  (2) 
nor  the  consequence  closure  axiom  (3).  However,  it  does  satisfy  a  weak  induction 
axiom: 

(2')  C(p  D  ELip)  D  (<p  D  CL<p). 

Here,  ip  D  EL<p  needs  to  be  common  knowledge  for  p  to  imply  CLip.  Note  that 
if  it  is  common  knowledge  that  a  meosage  is  likely  to  be  delivered  to  all  processors 


SECT.  4.4 


LIKELY  COMMON  KNOWLEDGE  AND  OTHER  VARIANTS  43 


immediately,  then  the  weak  induction  axiom  suffices  for  a  processor  that  receives  a 
broadcast  message  m  to  conclude  that  sent(m)  is  likely  common  knowledge. 

Another  consequence  of  the  fact  that  L  does  not  satisfy  consequence  closure  is 
that  although  L(p  A  ip)  D  Lp  A  Ltfi,  the  converse  does  not  hold.  Thus,  the  infinite 
conjunction  p  A  ELp  A  (EL)2p  A  •  •  •  is  strictly  weaker  than  CLp  as  defined  above. 
Note  that  (EL)2p  (=  LELEp)  does  not  necessarily  imply  L2E2p.  Indeed,  just  as 
with  CLp  does  not  necessarily  imply  L2E2p.  In  fact,  E2p  does  not  necessarily 
hold  at  any  level  of  likelihood! 

In  the  coordinated  attack  problem,  if  (it  is  common  knowledge  that)  it  is  likely 
that  a  messenger  sent  from  camp  A  to  camp  B  will  deliver  the  message  to  camp 
B  within  an  hour,  then  any  message  general  A  sends  general  B  becomes  likely 
common  knowledge  an  hour  after  it  is  sent.  Likely  common  knowledge  is  a  very 
useful  notion.  We  mentioned  earlier  that  when  we  say  “the  president”  we  assume 
common  knowledge  of  who  the  term  “the  president”  refers  to.  Likely  common 
knowledge  actually  captures  this  situation  in  a  better  way,  since  there  might  be 
some  out-of-touch  person  who  does  not  know  who  the  president  (of  the  company) 
is.  Thus,  when  we  use  a  definite  reference  such  as  “the  president”  we  are  essentially 
treating  a  fact  that  is  likely  common  knowledge  as  if  it  were  common  knowledge. 
In  fact,  in  practically  all  the  cases  in  which  people  believe  that  they  attain  common 
knowledge  of  a  fact,  it  can  be  argued  that  they  have  only  likely  common  knowledge, 
where  the  “likely”  qualification  in  this  case  corresponds  to  the  (commonly  known) 
likelihood  that  all  the  members  of  the  group  axe  aware  and  clever  enough  to  conclude 
that  the  fact  is  (likely)  common  knowledge.  In  some  cases,  of  course,  this  likelihood 
may  be  very  close  to  certainty. 

We  have  defined  likely  common  knowledge  for  an  abstract  notion  of  “likely” .  It 
is  often  the  case  that  the  degree  of  likelihood  of  message  delivery  is  quantified,  e.g., 
in  terms  of  a  given  probability  7T.  (Note  that  identifying  a  system  with  the  set  S 
of  its  runs  facilitates  such  definitions.  The  standard  techniques  of  measure  theory 
and  probability  theory  immediately  apply.)  Given  any  specific  flavor  of  likelihood 
we  can  clearly  define  a  variant  of  likely  common  knowledge  that  corresponds  to  it 
as  the  greatest  fixpoint  of  an  appropriate  equation.  Thus,  we  may  have  variants 
of  common  knowledge  corresponding  to  “With  probability  one” ,  “With  probability 
7 r”,  “Unlikely”,  and  so  on.  The  temporal  and  likelihood  aspects  of  message  delivery 
seem  to  be  orthogonal  in  nature.  In  some  cases  it  seems  natural  to  combine  them. 
This  is  useful,  for  example,  in  communication  that  is  characterized  by  a  probability 
distribution.  Consider  a  communication  scheme  in  which  at  any  given  time  step 
each  pending  (as  yet  undelivered)  message  is  delivered  with  probability  1/2.  The 
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state  of  knowledge  of  a  message  R2  sends  D2  in  such  a  system  can  be  described 
by  notions  of  common  knowledge  corresponding  to  “with  probability  1/2,  within 
one  time  step”,  “Likely  within  100  time  steps”  (where  “likely”  here  corresponds 
to  “with  probability  >  1  —  jhv”),  “with  probability  one,  eventually”,  or  “with 
geometric  probability  distribution  (7(1/2)  over  time”.  These  different  notions  allow 
us  to  characterize  different  aspects  of  the  system’s  communication  properties,  and 
to  work  with  different  levels  of  abstraction. 

4.5  Discussion 

This  chapter  has  introduced  a  variety  of  states  of  knowledge,  discussed  their 
roles,  and  analyzed  the  relationship  between  them.  A  pattern  that  arises  from  the 
presentation  here  is  that  the  properties  of  a  communication  channel  determine  a 
particular  state  of  knowledge  that  is  the  result  of  broadcasting  a  message  over  such 
a  channel.  Similarly,  the  relative  manner  in  which  an  event  is  guaranteed  to  oc¬ 
cur  in  all  sites  of  the  system  closely  corresponds  to  a  related  state  of  knowledge. 
Many  of  these  states  of  knowledge  seem  to  be  approximations  of  common  knowl¬ 
edge,  accounting  for  things  such  as  an  uncertainty  in  the  relative  times  in  which 
events  take  place,  or  an  uncertainty  as  to  whether  they  will  take  place.  (Common 
knowledge  is  attained  by  events  that  are  guaranteed  to  take  place  simultaneously 
at  all  sites.)  Thus,  proving  statements  about  the  inability  to  attain  such  states  of 
knowledge  in  particular  circumstances  is  a  general  way  of  proving  the  inability  to 
be  guaranteed  to  perform  coordinated  action  of  the  corresponding  kind  under  the 
same  circumstances.  For  instance,  our  results  imply  that  if  communication  is  not 
guaranteed,  then  no  protocol  can  guarantee  to  achieve  even  eventual  coordinated 
attack.  Furthermore,  if  communication  is  asynchronous,  then  no  protocol  can  guar¬ 
antee  that  the  generals  sometimes  attack,  and  that  whenever  they  attack  they  do 
so  within  a  fixed  predetermined  amount  of  time  of  each  other.  However,  our  results 
are  more  general  than  the  coordinated  attack  problem,  and  they  immediately  apply 
to  problems  such  as  transaction  commit  in  distributed  databases  (cf.  [Gr]). 
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As  the  previous  chapters  have  illustrated,  the  success  of  certain  cooperative 
actions  in  a  distributed  environment  often  depends  on  the  attainment  of  various 
states  of  knowledge  by  the  group  of  agents  involved.  In  this  chapter  we  perform  a 
case  study  of  a  slightly  different  aspect  of  the  subtle  interplay  between  knowledge, 
communication  and  action,  by  analyzing  the  effect  of  broadcasting  a  particular  set 
of  instructions  using  different  communication  channels  in  various  settings.  In  order 
to  enhance  readability,  the  analysis  is  carried  out  within  the  context  of  a  fictional 
story,  and  the  presentation  is  to  a  great  extent  self-contained. 

5.1  Introduction 

The  “cheating  wives”  puzzle,  a  well-known  puzzle  from  the  folklore  (cf.  [GS]), 
has  long  been  one  of  the  primary  examples  of  the  subtle  interdependence  between 
knowledge  and  action.  (We  have  already  presented  it  as  the  muddy  children  puz¬ 
zle  in  Chapter  1.)  We  will  now  reveal  the  contents  of  recently  discovered  scrolls, 
allegedly  written  by  the  great  scholar  Josephine  of  the  lost  continent  of  Atlantis. 
These  scrolls  describe  how  modernizing  the  means  of  communication  in  Atlantis 
over  the  generations  affected  the  resolution  of  the  recurring  problem  of  unfaithful 
husbands  there.  They  provide  some  indication  of  the  issues  involved  in  the  inter¬ 
action  between  knowledge,  action  and  communication.  In  particular,  one  of  the 
central  issues  that  are  illustrated  involves  what  knowledge  an  agent  who  knows 
something  about  how  other  individuals’  actions  are  related  to  the  facts  they  know, 
can  obtain  by  observing  the  other  individuals’  actions. 

The  original  cheating  husbands  problem  is  re-introduced  in  section  5.2.1  Sec¬ 
tion  5.3  describes  what  happens  when  an  asynchronous  communication  channel  is 
used  to  communicate  the  protocol  to  be  followed.  Section  5.4  involves  different  types 
of  synchronous  communication,  and  includes  a  discussion  of  the  conditions  under 
which  a  “cheating  husbands” -like  protocol  can  tolerate  “faults”  (disobedient  wives). 
Section  5.5  deals  with  ring-based  communication.  Section  5.6  treats  the  question 

1  Martin  Gardner  independently  presented  this  puzzle  in  terms  of  “cheating  husbands” 
in  the  thoroughly  amusing  [Gar]. 
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of  how  allowing  wives  to  communicate  a  small  amount  of  extra  information  allows 
a  substantially  faster  solution  to  the  problem.  Some  conclusions  are  presented  in 
section  5.7. 

5.2  The  cheating  husbands  puzzle 

Josephine’s  account  of  the  history  of  a  major  city  in  Atlantis  starts  with  the 
following  incident: 

The  queens  of  the  matriarchal  city-state  of  Mamajorca,  on  the  continent  of 
Atlantis,  have  a  long  record  of  opposing  and  actively  fighting  the  male  infidelity 
problem.  Ever  since  the  technologically- primitive  days  of  queen  Henrietta  I,  women 
in  Mamajorca  have  been  required  to  be  in  perfect  health  and  pass  an  extensive  logic 
and  puzzle-solving  exam  before  being  allowed  to  take  a  husband.  The  queens  of 
Mamajorca,  however,  were  not  required  to  show  such  competence. 

It  has  always  been  common  knowledge  among  the  women  of  Mamajorca  that 
their  queens  are  truthful  and  that  the  women  are  obedient  to  the  queens.  It 
was  also  common  knowledge  that  all  women  hear  every  shot  fired  in  Mamajorca. 
Queen  Henrietta  I  awoke  one  morning  with  a  firm  resolution  to  do  away  with  the 
male  infidelity  problem  in  Mamajorca.  She  summoned  all  of  the  women  heads-of- 
households  to  the  town  square  and  read  them  the  following  statement: 

There  are  (one  or  more )  unfaithful  husbands  in  our  community.  Al¬ 
though  none  of  you  knew  before  this  gathering  whether  your  own  husband 
was  faithful,  each  of  you  knows  which  of  the  other  husbands  are  unfaithful. 

I  forbid  you  to  discuss  the  matter  of  your  husband’s  fidelity  with  anyone. 
However,  should  you  discover  that  your  husband  is  unfaithful,  you  must 
shoot  him  on  the  midnight  of  the  day  you  find  out  about  it. 

Thirty  nine  silent  nights  went  by,  and  on  the  fortieth  night,  shots  were  heard. 

Josephine  does  not  explicitly  say  how  many  unfaithful  husbands  were  shot, 
how  many  unfaithful  husbands  were  in  Mamajorca  at  the  time,  how  some  cheated 
wives  learned  of  their  husbands’  infidelity  after  thirty  nine  nights  in  which  nothing 
happened,  or  whether  any  more  husbands  were  shot  on  later  nights.  The  inter¬ 
ested  reader  should  stop  at  this  point  and  try  to  answer  these  questions  based  on 
Josephine’s  account. 

Let  us  consider  the  questions  Josephine  leaves  unanswered.  Since  Henrietta  I  was 
truthful,  there  must  have  been  at  least  one  unfaithful  husband  in  Mamajorca.  How 
would  events  have  evolved  if  there  was  exactly  one  unfaithful  husband?  His  wife, 
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upon  hearing  the  queen’s  statement,  would  have  concluded  that  her  own  husband 
was  unfaithful,  and  would  have  shot  him  on  the  midnight  of  the  first  night.  Clearly, 
there  must  have  been  more  them  one  unfaithful  husband.  (Recall  that  the  wives 
are  all  perfect  logicians.2)  If  there  had  been  exactly  two  unfaithful  husbands,  then 
every  cheated  wife  would  have  initially  known  of  exactly  one  unfaithful  husband, 
and  would  have  reasoned  as  follows:  “If  the  unfaithful  husband  I  know  of  is  the 
only  unfaithful  husband,  then  his  wife  will  shoot  him  on  the  first  night.”  Therefore, 
neither  one  of  the  cheated  wives  would  shoot  on  the  first  night.  On  the  morning 
of  the  second  day  each  cheated  wife  would  realize  that  the  unfaithful  husband  she 
knew  about  was  not  the  only  one,  and  that  therefore  her  own  husband  must  be 
unfaithful.  The  unfaithful  husbands  would  thus  both  be  shot  on  the  second  night. 
In  fact,  similar  reasoning  is  used  by  the  wives  in  general,  and  the  following  theorem, 
well  known  in  the  folklore,  resolves  our  doubts  regarding  Josephine’s  presentation 
of  the  facts: 

Theorem  5.1:  If  there  had  been  n  unfaithful  husbands  in  Maunajorca  at  the  time 
Henrietta  I  announced  her  ruling,  they  would  all  have  been  shot  on  the  midnight  of 
the  nth  day. 

Proof:  The  discussion  above  shows  the  claim  for  n  =  1.  Assume  that  the  claim 
holds  for  n  =  k.  Thus,  if  there  were  k  unfaithful  husbands  they  would  be  shot  on 
the  fcth  night.  We  wish  to  show  that  if  there  were  n  =  k+l  unfaithful  husbands  they 
would  have  been  shot  on  the  (k  +  l)8t  nigj.t.  Assume  therefore  that  there  were  k  -f 1 
unfaithful  husbands.  Every  cheated  wife  knows  of  exactly  k  unfaithful  husbands. 
Because  of  the  wives’  logical  competence,  they  know  that  if  there  are  exactly  k 
unfaithful  husbands  then  those  husbands  will  all  be  shot  on  the  fcth  night.  Before 
the  fcth  night,  a  cheated  wife  cannot  determine  that  her  husband  is  unfaithful,  and 
therefore  no  shots  are  fired  in  any  of  the  first  k  nights.  Since  the  fcth  night  is  silent, 
every  cheated  wife  concludes  on  the  morning  of  the  (fc  +  l)9t  day  that  there  must 
be  more  than  k  unfaithful  husbands,  and  that  her  own  husband  must  therefore  be 

2  The  fact  that  the  wives  are  perfect  reasoners  plays  a  crucial  role  in  all  of  the  cases 
we  treat.  The  nature  of  the  situation  changes  substantially  if  we  relax  this  assumption, 
since  wives  must  then  reason  about  the  logical  capabilities  of  other  wives.  Some  prelimi¬ 
nary  steps  towards  dealing  with  such  a  situation  are  presented  in  [Kon],  where  Konolige 
considers  a  version  of  the  wise  men  puzzle  —  a  well  known  puzzle  that  is  a  special  case 
of  the  cheating  husbands  problem  —  which  he  calls  the  not-so-wise  men  puzzle,  in  which 
the  knowers  are  not  perfect  logicians. 
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unfaithful.  The  unfaithful  husbands  are  shot  on  the  ( k  +  l)8t  night.  The  theorem 
follows  by  induction.  m 

Notice  the  subtlety  of  the  situation:  On  the  first  day,  immediately  after  the 
queen  delivers  her  statement,  a  wife  who  knows  of  k  unfaithful  husbands  knows 
that  every  cheated  wife  knows  of  at  least  k  —  1  unfaithful  husbands,  and  knows  that 
their  wives  know  of  at  least  k  —  2  unfaithful  husbands,  and  that  their  wives  know 

of  at  least  k  —  3  unfaithful  husbands _ It  follows  that  every  wife  thinks  that  it  is 

possible  that  a  cheated  wife  thinks  that  it  is  possible  that  a  cheated  wife  thinks  it  is 
possible  . . .  that  a  cheated  wife  knows  of  no  unfaithful  husbands  other  than  her  own. 
Thus,  for  all  k  >  1,  it  is  not  common  knowledge  that  there  are  at  least  k  unfaithful 
husbands.  The  queen’s  statement,  however,  is  common  knowledge.  This  follows 
from  the  fact  that  the  queen  announced  it  publicly,  thereby  making  it  common 
knowledge  that  all  of  the  wives  heard  her  announcement.3  It  follows  that  after  the 
queen  speaks,  it  is  common  knowledge  that  there  is  at  least  one  unfaithful  husband. 
Given  the  wives’  famous  logical  capabilities,  it  is  common  knowledge  that  if  there  is 
only  one  unfaithful  husband  then  he  will  be  shot  on  the  first  night.  Therefore,  once 
the  first  night  is  silent  it  becomes  common  knowledge  that  there  are  at  least  two 
unfaithful  husbands.  Similarly,  after  k  silent  nights  (but  not  earlier!),  it  is  common 
knowledge  that  there  are  at  least  Jb  +  1  unfaithful  husbands  and  that  every  wife 
knows  of  at  least  k  unfaithful  husbands  other  than  her  own.  So  although  a  wife 
that  knows  of  k  unfaithful  husbands  knows  that  there  will  be  no  shots  before  the 
fcth  night,  her  state  of  knowledge  changes  following  every  silent  night,  even  though 
there  is  no  “communication”  at  all! 

3  For  a  discussion  of  this  point,  see  Chapter  1. 
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5.3  Asynchronous  communication 

Josephine’s  description  of  Mamajorca  continues  with  the  following  account: 

Queen  Henrietta  I  was  highly  regarded  by  her  subjects  for  her  wisdom  in  run¬ 
ning  the  monarchy.  She  ordered  her  daughters  to  continue  her  moral  fight  against 
male  infidelity. 

Her  daughter,  Henrietta  II,  succeeded  her.  In  order  to  facilitate  communication 
with  her  subjects,  Henrietta  II  installed  a  mail  system  from  her  court  to  all  of 
the  households  in  Mamajorca.  Her  first  letter  to  her  subjects  told  them  about  the 
properties  of  the  new  mail  system:  every  letter  she  sends  her  subjects  is  guaranteed 
to  eventually  reach  each  and  every  one  of  them.  Thus,  she  will  not  need  to  gather 
them  in  the  town  square  for  announcements  anymore.  Eager  to  fulfill  her  mother’s 
wish,  Henrietta  II’s  second  letter  to  her  subjects  was  an  exact  copy  of  her  mother’s 
original  statement. 

Henrietta  II  suffered  great  disgrace  and  died  in  despair.  She  ordered  her  daugh¬ 
ters  not  to  repeat  her  mistake. 

Josephine  suggests  that  despite  the  fact  that  Henrietta  II  gave  the  wives  of 
Mamajorca  exactly  the  same  instructions  as  her  mother,  her  mother  was  honored, 
whereas  she  was  disgraced.  Again,  Josephine  refrains  from  explicitly  stating  why 
this  happened.  Let  us  consider  the  possible  outcomes  of  Henrietta  II’s  action.  Had 
there  been  exactly  one  unfaithful  husband  at  the  time,  his  wife  would  have  shot  him 
on  the  first  night  after  receiving  the  queen’s  letter,  and  the  queen  would  have  been 
saved  from  disgrace.  If  there  had  been  exactly  two  unfaithful  husbands,  however, 
each  of  their  wives  would  know  about  the  existence  of  one  unfaithful  husband, 
and  that  if  the  husband  she  knows  about  is  the  only  unfaithful  one,  then  his  wife 
will  shoot  him  on  the  day  she  receives  the  letter.  Because  the  mail  system  is 
asynchronous,  with  messages  only  guaranteed  to  be  delivered  eventually,  neither 
wife  would  ever  know  that  the  other  had  already  received  the  queen’s  letter.  Thus, 
neither  wife  would  know  that  her  husband  is  unfaithful:  she  would  always  consider 
it  possible  that  her  own  husband  is  faithful  and  that  the  cheated  wife  she  knows 
about  has  not  shot  yet  because  the  queen’s  letter  has  yet  to  reach  her.  An  immediate 
consequence  of  the  above  argument  is: 

Theorem  5.2;  If  there  is  more  than  one  unfaithful  husband,  and  the  original  in¬ 
structions  are  broadcast  over  an  asynchronous  channel,  then  no  unfaithful  husbands 
are  shot.  cx 
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Because  the  letter  is  broadcast  using  an  asynchronous  channel,  the  queen’s  let¬ 
ter  becomes  eventual  common  knowledge:  once  the  queen  sends  it,  every  wife  will 
eventually  receive  the  letter,  and  when  she  does  she’ll  know  that  all  wives  will  even¬ 
tually  receive  the  letter,  and  know  . . .  (see  Chapter  3).  However,  at  no  time  does 
a  wife  know  that  all  other  wives  have  received  the  letter.  Thus,  a  wife  can  never 
determine  whether  the  silent  nights  are  a  result  of  other  wives’  reaction  to  receiving 
the  letter  or  a  result  of  the  fact  that  they  have  yet  to  receive  the  letter.  This  prop¬ 
erty  of  asynchronous  communication  comes  up  in  a  similar  fashion  in  the  analysis 
of  the  Byzantine  agreement  problem  in  asynchronous  networks  (cf.  [FLP]).  There, 
the  asynchronous  nature  of  the  system  prevents  a  processor  from  ever  determining 
whether  it  has  not  received  messages  from  another  processor  because  the  other  pro¬ 
cessor  did  not  send  any  (and  thus  is  faulty),  or  because  the  messages  are  still  on 
their  way. 

Notice  that  even  if  all  of  the  wives  happened  to  receive  the  queen’s  letter  simul¬ 
taneously,  this  would  not  help.  The  fact  that  a  wife  must  always  consider  it  possible 
that  other  wives  have  not  yet  received  the  queen’s  letter  is  sufficient  to  prevent  her 
from  being  able  to  figure  out  whether  her  own  husband  is  unfaithful. 

5.4  Synchronous  communication 

Josephine  proceeds  to  describe  the  controversial  actions  that  ensued: 

Henrietta  III  succeeded  her  mother,  Henrietta  II.  She  decided  to  upgrade  the 
mail  system  that  her  mother  had  installed  in  order  to  avoid  her  mother’s  problem. 
Thus,  she  improved  the  mail  system  so  that  any  letter  sent  by  the  queen  was 
guaranteed  to  reach  all  of  her  subjects  no  later  than  one  day  after  it  was  sent. 

Henrietta  III  knew  that  unless  her  subjects  were  aware  of  the  improvement  in 
the  mail  system,  she  would  repeat  her  mother’s  mistake.  Thus,  Henrietta  Ill’s  first 
letter  to  her  subjects  announced  the  new  advances  in  the  mail  delivery  system,  and 
her  second  one  was  an  exact  copy  of  Henrietta  I’s  statement. 

Henrietta  III  was  considered  a  more  effective  monarch  than  her  mother,  but  she 
will  always  be  remembered  for  the  great  injustice  she  brought  upon  Mamajorca. 

If  only  she  had  told  her  subjects  to  wait  a  few  days  before  shooting,  however,  she 
could  have  attained  her  grandmother’s  fame! 

A  mail  system  that  guarantees  that  every  letter  sent  is  delivered  no  more  than 
6  —  1  days  after  it  is  sent  is  called  weakly  synchronous  with  bound  b.  If  we  call  the 
sending  day  the  first  day,  then  such  a  letter  is  delivered  to  all  wives  no  later  than 
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on  day  b.  Before  we  continue,  we  remark  that  in  Henrietta  Ill’s  days  no  calendar 
had  been  established  in  Mamajorca. 

Let  Ep  denote  “everyone  knows  p" ,  and 

Em+1p  =  E(Emp),  for  m  >  0. 

Notice  that  an  easy  proof  by  induction  shows  that  if  there  are  n  unfaithful  husbands, 
and  .En(“the  queen  sent  the  letter”)  becomes  true  at  some  point,  then  at  least  one 
cheated  wife  will  shoot  her  husband,  and  the  first  shot  will  be  fired  at  most  n  days 
after  f?n(“the  queen  sent  the  letter”)  first  holds.  In  our  case,  a  letter  sent  by  the 
queen  is  guaranteed  to  be  delivered  to  all  of  the  wives  in  less  than  b  days.  Thus,  once 
the  letter  is  sent  its  contents  become  b-common  knowledge :  within  b  days  every  wife 
receives  the  letter  and  knows  that  within  b  days  every  wife  will  receive  the  letter 
and  know  that  within  b  days  . . .  every  wife  will  know  the  contents  of  the  letter  (see 
Chapter  1).  Thus,  kb  days  after  the  queen  sends  the  letter,  Ek(“ the  queen  sent  the 
letter”)  holds,  so  it  is  certain  that  at  least  one  unfaithful  husband  will  be  eliminated. 

Although  Henrietta  III  was  probably  not  familiar  with  the  concept  of  5-common 
knowledge,  apocryphal  records  indicate  that  she  was  able  to  prove  the  following 
proposition: 

Proposition  5.3:  In  the  weakly  synchronous  case  with  the  bound  on  delivery 
being  6,  a  wife  that  knows  of  exactly  k  unfaithful  husbands  will  know  that  her  own 
husband  is  unfaithful  once  kb  silent  nights  pass  after  the  day  she  receives  the  queen’s 
letter. 

Proof:  A  wife  knowing  of  k  =  0  unfaithful  husbands  requires  kb  =  0  silent  nights 
to  conclude  that  her  own  husband  is  unfaithful.  By  the  queen’s  statement,  that 
wife  does  not  know  that  her  husband  is  unfaithful  any  earlier  than  that.  Assume 
inductively  that  a  wife  knowing  of  k  unfaithful  husbands  requires  kb  silent  nights 
to  conclude  that  her  own  husband  is  unfaithful,  and  suppose  Mary  knows  of  k  +  1 
unfaithful  husbands.  Mary  knows  that  if  her  own  husband  is  faithful,  then  every 
cheated  wife  knows  of  exactly  k  unfaithful  husbands,  and,  by  the  induction  hypoth¬ 
esis,  will  shoot  her  husband  on  the  following  night  should  kb  silent  nights  go  by  after 
the  cheated  wife  receives  the  letter.  For  all  Mary  knows,  it  is  initially  possible  that 
her  husband  is  faithful,  and  the  letter  may  reach  the  first  cheated  wife  to  receive 
it  b  —  1  days  after  Mary  receives  it.  Thus,  she  must  consider  it  possible  that  no 
shots  will  be  fired  before  the  ( k  +  l)6th  night  after  she  receives  the  queen’s  letter. 
However,  should  that  night  be  silent,  Mary  will  know  that  her  husband  is  unfaithful. 
The  claim  follows  by  induction.  M 
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Thus,  Henrietta  III  was  guaranteed  not  to  suffer  her  mother’s  disgrace.  However, 
what  she  didn’t  realize  was  that  noisy  nights  might  confuse  some  of  the  wives. 
Consider,  for  example,  the  following  scenario:  The  queen’s  letters  are  guaranteed 
to  arrive  in  less  than  2  days  (i.e.,  b  =  2),  and  Susan  knows  that  Mary’s  husband  is 
unfaithful.  Suppose  Susan  receives  the  queen’s  letter  on  a  Monday,  and  hears  Mary 
shoot  her  own  husband  at  midnight  on  Tuesday  night.  Unfortunately,  now  Susan 
will  not  be  able  to  figure  out  whether  or  not  her  own  husband  is  faithful.  Susan  does 
not  know  whether  the  queen  originally  sent  the  letter  on  Sunday  or  on  Monday,  and 
thus  considers  it  possible  that  Mary  received  the  queen’s  letter  on  either  Sunday, 
Monday  or  Tuesday.  In  particular,  Susan  considers  both  of  the  following  scenarios 
possible: 

•  Mary  received  the  letter  on  Tuesday  and,  knowing  that  Susan’s  husband  is 
faithful,  shot  her  own  husband  on  Tuesday  night. 

•  Mary  received  the  letter  on  Sunday  and,  knowing  that  Susan’s  husband  is  un¬ 
faithful,  waited  to  see  if  Susan  would  shoot  her  husband  on  Sunday  or  Monday 
night.  Since  Susan  did  not  shoot,  on  Tuesday  Mary  concluded  that  her  own 
husband  was  unfaithful,  and  shot  him. 

Thus,  Susan  cannot  determine  whether  her  own  husband  is  faithful  based  on  Mary’s 
actions.  Furthermore,  she  will  never  obtain  any  more  information  on  the  subject 
and  will  remain  in  doubt  forever. 

We  call  the  first  day  on  which  the  queen’s  letter  is  delivered  to  a  cheated  wife 
the  first  significant  day.  Given  Proposition  5.3,  it  is  easy  to  see  that  cheated  wives 
that  receive  the  queen’s  letter  on  the  first  significant  day  will  be  the  first  to  shoot 
their  husbands.  Do  any  other  cheated  wives  shoot  their  husbands? 

Every  wife  has  an  interval  of  b  —  1  days  in  which  a  noisy  night  would  leave  her 
in  doubt  regarding  her  husband’s  fidelity.  To  see  this,  recall  that  a  wife  knowing 
of,  say,  k  >  0  unfaithful  husbands  does  not  initially  know  whether  there  are  k  or 
k  +  1  unfaithful  husbands  in  all.  Furthermore,  for  all  she  knows  the  first  significant 
day  may  happen  anywhere  between  b  —  1  days  before  she  receives  queen’s  letter  and 
6—1  days  after  she  receives  it.  “If  there  are  k  unfaithful  husbands,”  she  reasons, 
“then  at  least  one  of  them  will  be  shot  on  the  (( k  —  1)6-1-  i)th  night  after  the  day  his 
wife  receives  the  letter,  that  is,  between  the  ((k  —  2)b+2)nd  and  the  kbth  night  after 
the  day  I  receive  the  letter.  If,  however,  there  are  k  +  1  unfaithful  husbands,  one  of 
them  will  be  shot  between  the  ((A:  —  l)6  +  2)nd  and  the  (fc6+l)st  night  after  the  day  I 
receive  the  letter.”  Thus,  if  the  first  shot  occurs  between  the  ((k  —  l)6+2)nd  and  the 
kbih  night  after  the  day  she  receives  the  queen’s  letter,  a  wife  initially  knowing  of 
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exactly  k  unfaithful  husbands  will  be  left  in  doubt  regarding  her  husband’s  fidelity. 
Since  a  cheated  wife  that  receives  the  queen’s  letter  after  the  first  significant  day 
will  hear  a  shot  in  her  interval  of  uncertainty,  we  have: 

Theorem  5.4:  Using  weakly  synchronous  broadcast,  cheated  wives  that  receive 
the  queen’s  letter  on  the  first  significant  day  shoot  their  husbands  ((n  —  1)6  days 
after  the  first  significant  day,  where  n  is  the  number  of  unfaithful  husbands).  All 
other  cheated  wives  remain  forever  in  doubt  about  their  husbands’  fidelity.  M 

How  could  Henrietta  III  have  changed  the  instructions  slightly  and  avoided  the 
problem?  Josephine  seems  to  suggest  that  this  could  have  been  done  by  requiring 
a  cheated  wife  to  wait  a  few  days  after  learning  of  her  husband’s  infidelity,  before 
shooting  him.  First  notice  that  the  wives’  reasoning  is  slowed  down  considerably  if 
the  shooting  happens  only  after  a  delay: 

Proposition  5.5:  In  a  weakly  synchronous  mail  system  with  bound  6,  if  every 
wife  is  required  to  wait  d  days  from  the  day  she  discovers  her  husband’s  infidelity 
before  shooting  him,  then  a  wife  that  knows  of  exactly  k  unfaithful  husbands  will 
know  that  her  own  husband  is  unfaithful  once  k(b  +  d)  silent  nights  pass  from  the 
day  she  receives  the  queen’s  letter  (and,  as  long  as  all  preceding  nights  are  silent, 
no  earlier!). 

Proof:  Analogous  to  the  proof  of  Proposition  5.3.  For  k  =  0  the  statement  is 
trivially  true.  Assume  inductively  that  it  holds  for  k  and  that  Mary  knows  of  k  +  1 
unfaithful  husbands.  Mary  knows  that  if  there  are  exactly  fc-f-1  unfaithful  husbands, 
then  every  cheated  wife  knows  of  k  unfaithful  husbands.  Thus,  a  cheated  wife  that 
receives  the  queen’s  letter  on  the  first  significant  day  (i.e.,  at  least  as  early  as  any 
other  cheated  wife)  will  know  that  her  husband  is  unfaithful  once  k(b  +  d)  silent 
nights  pass  from  the  day  she  receives  the  letter.  Ordinarily  she  would  wish  to  shoot 
on  the  ( k(b  +  d)  +  l)st  night,  but  since  she  must  delay  d  days,  she  will  shoot  her 
husband  on  the  ( k(b  +  d)  +  d  +  l)st  night  after  receiving  the  letter.  Since  Mary 
must  consider  it  possible  that  the  first  significant  day  occurs  as  many  as  6  —  1  days 
after  she  receives  the  letter,  Mary  will  know  that  her  own  husband  is  unfaithful 
once  k(b  +  d)  +  d  +  1  +  6  —  1  =  (k  +  1)(6  +  d)  silent  nights  pass  and  no  earlier.  The 
lemma  follows  by  induction.  tx 

Josephine’s  claim  is  confirmed  by  the  following  theorem: 

Theorem  5.6:  If  the  delay  is  sufficiently  long,  more  precisely  if  d  >  6—1,  then 
all  cheated  wives  shoot  their  husbands  and  no  wife  remains  in  doubt. 
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Proof:  We  use  Proposition  5.5  in  a  fashion  similar  to  that  in  which  Theorem  5.4 
uses  Proposition  5.3.  First,  some  notation  is  needed.  Let  F  be  the  first  significant 
day,  let  D  be  the  day  Mary  receives  the  letter,  and  let  5  be  the  day  preceding  the 
night  of  the  first  shot.  Notice  that  the  proof  of  Proposition  5.5  implies  that  if  n  >  1 
is  the  number  of  unfaithful  husbands,  then  S  =  F  +  (n  —  1)(6  +  d)  -f  d-f  1.  Assume 
that  Mary  knows  of  exactly  k  >  1  unfaithful  husbands.  Initially,  as  far  as  Mary  is 
concerned,  there  are  two  possibilities: 

•  Mary’s  own  husband  is  faithful.  In  this  case  Mary  knows  that  D  —  (6  —  1)  <  F  < 
D  +  (6  —  1).  (Notice  that  Mary  must  consider  the  whole  interval  possible.)  Since 
the  number  of  unfaithful  husbands  is  k,  it  follows  that  S  =  F+(k-l)(b+d)-\-d+l. 
Substituting  h  for  D  4-  (k  —  1  )(b  +  d)  +  d  +  1,  Mary  has: 

h-(b- 1)  <  S  <  h  +  (b-l). 

•  Mary’s  own  husband  is  unfaithful.  In  this  case  Mary  knows  that  D  —  {b  —  1)  < 
F  <  D  (We  must  have  F  <  D,  since  otherwise,  the  first  cheated  wife  to  receive 
a  letter  does  so  after  Mary  does,  contradicting  the  assumption  that  Mary’s 
husband  is  unfaithful.)  Also,  S  =  F  +  k(b  +  d)  +  d  +  1,  because  there  are  k  +  1 
unfaithful  husbands.  Substituting  h  as  above,  Mary  has: 

>*  +  (d  +  l)  <  S’  <  h  +  (b  +  d). 

Therefore,  if  d  +  1  >  6—1  (i.e.,  d  >  6—1),  then  Mary  can  distinguish  these 
possibilities  (given  that  she  knows  5,  h,  6,  and  d),  and  thus  is  guaranteed  to  be 
able  to  determine  whether  her  husband  is  unfaithful.  It  is  easy  to  present  scenarios 
that  show  that  no  smaller  delay  suffices.  One  such  scenario  is  the  example  following 
Proposition  5.3  above.  There  6  =  2  and  d  =  0  =  6  —  2.  m 

Josephine  remarks: 

...  Of  course,  the  shrewd  residents  of  the  Wisegal  district  of  Mamajorca  avoided 
any  eventual  doubts  by  bribing  the  mailperson. 

We  assume  that  the  social  attitude  towards  bribes  in  Mamajorca  was  quite  differ¬ 
ent  from  the  attitude  towards  infidelity.  Consequently,  (it  was  common  knowledge 
that)  bribery  would  be  kept  a  secret  between  a  bribing  wife  and  her  mailperson.  It 
is  also  known  that  delivering  mail  was  not  an  acceptable  profession  for  the  wives 
of  Mamajorca.  Thus,  it  was  common  knowledge  that  no  wife  knew  of  a  wife  that 
bribed  the  mailperson.  Given  these  circumstances,  the  following  proposition  clari¬ 
fies  Josephine’s  statement: 
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Proposition  5.7:  In  the  weakly  synchronous  case,  a  wife  that  bribes  the  mailper- 
son  into  telling  her  when  the  queen  had  originally  sent  the  letter,  does  eventually 
know  whether  her  own  husband  is  faithful. 

Proof:  Let  the  bound  on  delivery  be  6.  Using  Proposition  5.3,  it  is  easy  to  show 
by  a  straightforward  induction  that  if  there  are  k  unfaithful  husbands  then  the  first 
shot  occurs  between  the  (( k  —  1)6  +  l)8t  night  and  the  1:6th  night  after  the  queen 
sends  the  letter.  Thus,  a  wife  that  knows  of  k  unfaithful  husbands  and  bribes  her 
mailperson,  knows  that  her  husband  is  unfaithful  if  no  shot  is  heard  before  the  1:6th 
night,  and  knows  that  he  is  faithful  otherwise.  The  crucial  point  is  that  a  wife 
that  bribes  her  mailperson  knows  which  night  is  the  k6th  night,  and  thus  eventually 
knows  whether  her  husband  is  faithful.  M 

Josephine  continues  with  the  reign  of  Henrietta  IV: 

Henr»  >tta  IV,  who  succeeded  her  mother  as  queen,  concluded  that  the  lack  of 
a  calendar  was  the  reason  behind  the  injustice  of  her  mother’s  scheme.  She  sum¬ 
moned  the  women  of  Mamajorca  to  the  town  square  and  announced  the  initiation 
of  a  calendar  beginning  that  day.  “From  this  day  on,”  she  said,  “the  mail  system 
wiii  be  strongly  synchronous:  every  letter  sent  from  the  queen  will  bear  the  mailing 
date,  and  will  be  guaranteed  to  be  delivered  to  all  of  her  subjects  within  less  than  b 
days.”  At  a  later  date,  Henrietta  IV  sent  her  subjects  a  letter  bearing  the  mailing 
date,  and  containing  an  exact  copy  of  Henrietta  I’s  original  instructions.  A  thou¬ 
sand  silent  nights  followed,  and  on  the  thousand  and  first  day,  Henrietta  IV  decided 
to  send  another  letter.  She  had  finally  realized  that  as  a  result  of  Henrietta  Ill’s 
great  injustice,  the  wives  of  Mamajorca  lost  much  of  their  faith  in  the  monarchy 
and  its  orders.  It  was  still  common  knowledge  that  the  queens  were  truthful,  and 
the  vast  majority  of  her  subjects  were  obedient,  but  it  was  no  longer  clear  that 
all  wives  would  obey  the  queen’s  orders.  Henrietta  IV’s  letter  contained  one  line: 
“There  is  at  least  one  obedient  wife  whose  husband  is  unfaithful.” 

Henrietta  IV’s  wisdom  was  greatly  appreciated  throughout  Atlantis,  and  her 
success  restored  her  subjects’  faith  in  the  monarchy. 

Let  us  see  why  the  obedient  wives  could  not  figure  out  whether  their  husbands 
were  faithful  before  receiving  Henrietta’s  second  letter: 

Proposition  5.8:  In  the  strongly  synchronous  case,  if  there  is  exactly  one  cheated 
wife,  and  she  is  disobedient,  all  of  the  other  wives  are  in  danger  of  shooting  theii 
husbands  on  the  second  night.  M 
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Clearly,  if  the  other  wives  had  not  suspected  that  the  cheated  wife  might  be 
disobedient,  all  of  the  faithful  husbands  would  have  been  shot,  whereas  the  unfaith¬ 
ful  husband  would  have  survived!  Notice  that  once  this  is  a  possibility,  even  if  ail 
wives  are  in  fact  obedient  they  cannot  shoot.  To  see  this,  consider  the  case  in  which 
there  are  exactly  two  cheated  wives.  On  the  second  day  each  cheated  wife  cannot 
determine  whether  the  first  night  was  silent  because  her  own  husband  is  unfaithful 
or  because  the  other  cheated  wife  was  disobedient.  Thus,  no  shots  are  fired  on  the 
second  night.  Similarly,  no  shots  will  be  fired  on  any  later  nights.  It  is  now  easy  to 
show  by  induction  that  such  is  the  case  if  there  are  k  cheated  wives,  for  all  k  >  1. 
So  how  did  the  queen’s  second  letter  help? 

Theorem  5.9:  In  the  strongly  synchronous  case,  if  it  is  common  knowledge  that 
there  is  at  least  one  obedient  cheated  wrife,  then  all  obedient  cheated  wives  will 
shoot  their  husbands. 

Proof:  The  argument  here  is  very  similar  to  that  of  Theorem  5.1,  with  a  slight 
twist.  If  there  is  only  one  unfaithful  husband,  then  his  wife  is  the  only  cheated 
wife.  Since  there  is  at  least  one  obedient  cheated  wife,  she  must  be  obedient,  and 
therefore  will  shoot  her  husband  on  the  day  she  receives  the  second  letter.  If  there 
are  exactly  k  —  2  cheated  wives,  then  each  obedient  cheated  wife  reasons  as  follows: 
“If  my  husband  is  faithful  then  the  cheated  wife  I  know  of  must  be  obedient” ,  and 
therefore  will  shoot  her  husband  when  she  receives  the  letter,  at  most  b  —  1  days 
after  the  queen  sent  it  (on  day  6  at  the  latest).  Thus,  if  by  day  b  +  1  no  shots 
are  fired,  an  obedient  cheated  wife  knows  that  her  own  husband  is  unfaithful,  and 
shoots  her  husband  on  that  night.  Assume  inductively  that  if  there  are  exactly 
k  >  2  unfaithful  husbands  then  all  obedient  cheated  wives  shoot  their  husbands 
on  the  (b  +  k  —  l)9t  night.  If  there  are  exactly  k  +  1  unfaithful  husbands,  then 
each  obedient  cheated  wife  knows  of  k  unfaithful  husbar.ds,  and  knows  that  if  her 
own  husband  is  faithful  then  at  least  one  unfaithful  husband  will  be  shot  on  the 
(b  +  k  —  l)8t  night.  Thus,  once  that  night  is  silent,  she  knows  that  (even  though 
she  might  be  the  only  obedient  cheated  wife)  her  husband  is  unfaithful,  and  shoots 
him  on  the  (b  +  (k  +  l)  —  l)st  night.  The  theorem  follows  by  induction.  \x 

Observe  the  difference  between  the  bribed  dates  case,  described  in  Proposi¬ 
tion  5.7,  and  the  strongly  synchronous  case  of  Theorem  5.9.  If  all  of  the  wives 
bribed  the  mailperson,  then  all  of  the  unfaithful  husbands  would  be  s^ot,  and  no 
wife  would  remain  in  doubt  regarding  her  husband’s  fidelity.  However,  it  takes 
(n  —  1)6  +  1  days  to  eliminate  n  >  2  cheating  husbands.  Before  the  end  of  the 
process  the  wives  would  not  necessarily  know  that  justice  would  be  done,  and  at 
the  end  it  would  not  be  known  whether  any  wife  remains  in  doubt  regarding  her 
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own  husband’s  fidelity.  In  the  strongly  synchronous  case,  it  takes  b  -f  n  —  1  nights 
to  eliminate  n  >  2  unfaithful  husbands,  and  it  is  common  knowledge  that  justice  is 
done.  The  difference  between  the  two  cases  can  be  best  understood  by  noting  that 
in  the  first  case  every  wife  knew  on  what  day  the  queen  sent  the  letter,  but  no  wife 
knew  that  others  knew,  whereas  in  the  strongly  synchronous  case  the  day  on  which 
the  queen  sent  the  letter  was  common  knowledge. 

5.5  Ring-based  communication 

Josephine  describes  the  outcome  of  a  similar  approach  to  the  male  infidelity 
problem  in  the  neighboring  city-state  of  Mamaringa,  in  which  the  households  were 
arranged  in  a  ring: 

The  queens  of  the  neighboring  matriarchal  city-state  of  Mamaringa  commonly 
adopted  customs  and  rules  from  Mamajorca.  Thus,  Mamaringa  was  similar  to 
Mamajorca  in  all  respects,  except  that  its  households  were  built  in  a  ring  around 
the  great  Mt.  Rouge.  The  location  of  each  household  in  the  ring  was  common 
knowledge,  as  was  the  fact  that  mail  was  delivered  in  clockwise  order  around  the 
ring. 

The  queens  of  Mamaringa  tried  to  eliminate  the  infidelity  problem  by  sending 
Henrietta  I’s  letter  once  around  the  ring,  using  the  state-of-the-art  mail  system 
in  every  generation.  None  of  the  queens  of  Mamaringa  suffered  the  disgrace  of 
Henrietta  II,  and  none  attained  the  honor  of  Henrietta  IV.  They  will  all  be  forever 
remembered  as  cruel  and  unjust  queens. 

It  is  assumed  that  the  queens  of  Mamaringa  hoped  that  the  extra  knowledge  of 
the  order  in  which  letters  are  delivered  would  be  helpful  in  justly  eliminating  all 
unfaithful  husbands.  However,  the  asymmetry  introduced  by  this  knowledge  makes 
a  big  difference,  as  the  following  theorem  shows: 
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Theorem  5.10: 

(a)  In  asynchronous  delivery  around  a  ring,  the  last  cheated  wife  to  receive  the  letter 
will  shoot  her  husband.  All  others  will  not. 

(b)  In  weakly  synchronous  delivery  around  a  ring,  some  cheated  wives  will  shoot 
their  husbands,  but  some  might  not. 

(c)  In  strongly  synchronous  delivery  around  a  ring,  some  cheated  wives  will  shoot 
their  husbands,  but  some  might  not. 

Proof:  (a)  We  prove  by  induction  that  in  the  asynchronous  case  a  cheated  wife 
knowing  of  k  cheated  wives  that  are  all  notified  before  her,  and  knowing  that  no 
cheated  wives  will  be  notified  after  her,  will  shoot  her  husband  k  nights  after  she 
receives  the  queen’s  letter  (and  no  earlier).  For  k  —  0  the  claim  is  trivial.  Assume 
inductively  that  the  claim  holds  for  k  and  that  Mary  knows  of  k  cheated  wives  in  the 
ring  before  her,  and  none  after  her.  Once  she  receives  the  letter  Mary  knows  that 
the  last  of  the  k  cheated  wives  she  knows  about  has  received  the  letter  no  later  than 
the  same  day  Mary  did.  Thus,  if  Mary’s  husband  is  faithful  then  the  last  cheated 
wife  she  knows  of  will  shoot  her  own  husband  no  later  than  k  nights  after  Mary 
received  the  letter.  Once  that  fails  to  happen,  Mary  shoots  her  own  husband  on  the 
(fc-f  l)8t  night  after  receiving  the  letter.  The  claim  follows  by  induction.  To  see  that 
no  other  cheated  wife  shoots  her  husband,  notice  that  because  of  the  asynchronous 
nature  of  delivery,  a  wife  knowing  of  a  cheated  wife  later  in  the  ring  does  not  know 
when  that  cheated  wife  will  receive  the  letter,  and  thus  cannot  deduce  from  the 
night  on  which  a  later  wife  shoots  that  her  own  husband  is  unfaithful  (although  in 
some  cases  she  will  be  able  to  deduce  that  her  own  husband  w  faithful). 

(b)  The  proof  of  Proposition  5.3  can  be  used  to  show  that  some  unfaithful  hus¬ 
bands  will  be  shot  in  this  case.  We  need  to  show  that  injustice  might  occur,  i.e., 
that  some  unfaithful  husbands  might  be  spared.  Consider  the  following  scenario: 
the  bound  on  delivery  is  b  =  2.  Mary  knows  of  only  one  cheated  wife,  Susan,  who 
lives  farther  down  the  ring  than  Mary.  Mary  receives  the  letter  on  Sunday  and 
hears  Susan  shoot  her  husband  on  Monday.  Mary  cannot  distinguish  between  the 
following  possibilities: 

•  Susan  received  the  letter  on  Sunday,  and  knowing  that  Mary’s  husband  was 
unfaithful,  she  waited  to  hear  if  Mary  would  shoot  on  Sunday  night.  When  she 
didn’t,  Susan  discovered  that  her  own  husband  was  unfaithful,  and  shot  him 
Monday  night. 
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•  Susan  received  the  letter  on  Monday,  and  knowing  that  Mary’s  husband  was 
faithful,  discovered  that  her  own  husband  was  unfaithful  and  shot  him  that 
night. 

Thus,  Mary  does  not  know  whether  her  husband  is  unfaithful  in  the  above  scenario, 
and  does  not  shoot  her  husband.  If  her  husband  is  in  fact  unfaithful,  this  constitutes 
a  case  of  injustice. 

(c)  The  proof  of  Proposition  5.3  again  ensures  us  that  some  husbands  will  be  shot. 
To  show  that  a  case  of  injustice  can  arise  with  strongly  synchronous  delivery  around 
a  ring,  consider  the  situation  described  in  (b)  above,  with  Sunday  being  the  official 
sending  date  of  the  letter.  Mary  still  considers  both  of  the  above  scenarios  possible, 
and  Mary’s  husband  is  spared.  Thus,  if  Mary’s  husband  is  unfaithful,  a  case  of 
injustice  occurs.  M 

Notice  that  in  the  asynchronous  case  knowing  the  order  of  delivery  does  help  a 
cheated  wife  (in  this  case  only  the  last  cheated  wife)  discover  that  her  husband  is 
unfaithful.  In  this  case  the  extra  knowledge  can  be  considered  “helpful” .  However, 
more  surprising  is  the  fact  that  the  wives’  knowing  the  order  of  delivery  allows  an 
unjust  solution  in  the  strongly  synchronous  case,  where  none  existed  without  such 
knowledge!  Thus,  by  introducing  an  asymmetry  in  the  wives’  reasoning,  this  extra 
knowledge  has  a  negative  effect  on  the  solution. 

5.6  Quick  elimination 

Queen  Margaret  opened  a  new  era  in  Mamajorca.  She  made  the  mail  system 
an  express  mail  system:  AH  letters  sent  from  her  court  were  guaranteed  to  be 
delivered  to  all  of  her  subjects  on  the  day  they  were  sent.  Her  first  letter  notified 
her  subjects  about  the  great  advance  in  their  communication  capabilities. 

Margaret  was  an  impatient  queen.  She  knew  that  using  her  mail  system  she 
could  successfully  execute  Henrietta  I’s  instructions.  However,  knowing  that  there 
were  many  unfaithful  husbands  in  Mamajorca,  and  not  wanting  to  wait  very  long 
for  them  to  be  eliminated,  she  decided  to  look  for  a  faster  way  to  solve  the  problem. 

She  did  so  by  giving  her  subjects  instructions  that  allowed  wives  to  shoot  into  the 
air  at  midnight.  Margaret’s  scheme  w as  very  successful;  the  unfaithful  husbands 
were  eliminated  from  Mamajorca  in  just  a  few  days. 


Notice  that  in  Henrietta  I’s  solution,  n  unfaithful  husbands  axe  eliminated  on 
the  nth  night  following  the  queen’s  announcement.  Margaret  sought  a  solution  that 
would  require  writing  fewer  than  0(n)  nights.  Given  that  shooting  in  the  air  at 
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midnight  is  allowed,  what  is  the  minimal  number  of  nights  in  which  the  unfaithful 
husbands  can  be  eliminated?  Margaret’s  problem  can  be  restated  as  follows:  Given 
a  distributed  system  in  which  the  processors  share  a  memory  consisting  of  a  single 
toggle  bit,  each  processor  has  a  value,  and  it  is  known  that  the  values  are  at  most 
one  apart,  how  many  rounds  of  communication  are  needed  for  the  processors  with 
the  minimal  value  to  know  it?  El  Gamal  and  Orlitsky  (cf.  [EO])  have  treated  similar 
questions  independently  in  a  more  general  setting.  The  following  theorem  answers 
this  question  in  Margaret’s  case: 

Theorem  5.11:  There  is  a  protocol  that  allows  shooting  in  the  air  in  which  the 
cheating  husbands  are  all  shot  by  the  third  night.  That  is  the  best  possible. 

Proof:  Let  us  first  show  that  a  protocol  in  which  a  wife’s  actions  depend  only  on 
the  number  of  unfaithful  husbands  she  initially  knows  of  and  the  actual  run  of  the 
protocol  must  require  at  least  three  nights.  Such  a  protocol  P  can  be  viewed  as  a 
set  of  protocols  P(k),  k  >  0,  each  specifying  how  a  wife  initially  knowing  of  exactly 
k  unfaithful  husbands  should  act.  If  for  some  k  >  1  both  P(k  —  1)  and  P(k  +  1)  do 
not  prescribe  any  shooting  on  the  first  night,  then  clearly  P(k)  must  require  at  least 
three  nights,  since  a  wife  knowing  of  k  unfaithful  husbands  cannot  know  whether 
her  own  husband  is  faithful  after  the  first  night.  If  P(k')  includes  shooting  in  the 
air  on  the  first  night  for  some  k'  >  1,  then  P(k')  must  require  at  least  three  nights 
when  there  are  k'  +  1  unfaithful  husbands.  A  wife  knowing  of  exactly  k'  unfaithful 
husbands  shoots  in  the  air  on  the  first  night,  and  cannot  determine  whether  her 
own  husband  is  unfaithful  before  the  second  night.  Thus,  for  all  k  >  1,  one  of  P(k), 
P(k  +  1),  or  P(k  +  2)  must  require  at  least  three  nights. 

The  following  protocol  solves  the  problem  in  three  nights: 

(a)  A  wife  knowing  of  ko  unfaithful  husbands,  with  ko  =  0  (mod  3),  fires 
her  gun  at  midnight  on  the  first  night.  If  ko  =  0  she  shoots  her  husband, 
otherwise  she  shoots  in  the  air. 

(bO)  If  there  was  no  shot  on  the  first  night,  then  a  wife  knowing  of  k\  unfaithful 
husbands,  with  k\  =  1  (mod  3)  should  shoot  her  husband  on  the  second 
night. 

(bl)  If  there  was  a  shot  on  the  first  night,  then  a  wife  knowing  of  k 2  unfaithful 
husbands,  with  k2  =  2  (mod  3)  should  shoot  her  husband  on  the  second 
night. 

(cOO)  If  both  first  nights  were  silent  then  all  wives  shoot  their  husbands  on  the 
third  night. 
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(clO)  If  there  was  a  shot  on  the  first  night,  and  no  shots  on  the  second  night, 
then  the  first  night  shooters  shoot  their  husbands  on  the  third  night  (if 
he  is  still  alive). 

Let  us  briefly  check  that  this  protocol  is  correct;  i.e.,  we  now  show  that  if  there 
is  at  least  one  unfaithful  husband,  then  all  unfaithful  husbands  are  shot,  and  no 
faithful  husbands  are  shot.  We  first  consider  the  case  where  there  is  at  least  one 
faithful  husband.  Thus,  if  n  =  k  -f  1  is  the  number  of  unfaithful  husbands,  then 
some  wives  know  of  k  +  1  unfaithful  husbands,  and  some  of  k.  If  k  =  2  (mod  3), 
then  the  wives  whose  husbands  are  faithful  will  shoot  in  the  air  on  the  first  night. 
The  cheated  wives  will  shoot  their  husbands  on  the  second  night  according  to  step 
(bl).  If  k  =  0  then  the  cheated  wife  will  shoot  her  husband  on  the  first  night  and 
no  other  shooting  occurs.  If  k  =  0  (mod  3)  and  k  >  0,  then  the  cheated  wives 
shoot  in  the  air  on  the  first  night,  the  other  wives  are  silent  on  the  second  night, 
and  the  cheated  wives  shoot  their  husbands  on  the  third  night.  If  k  =  1  (mod  3) 
then  the  first  night  is  silent,  and  the  cheated  wives  shoot  their  husbands  on  the 
second  night  by  (b0).  We  now  need  to  show  that  if  all  wives  sure  cheated  then  the 
husbands  are  shot.  This  is  simple,  since  in  all  cases  a  wife  that  hears  no  shots  other 
than  on  nights  she  shoots  ends  up  shooting  her  husband  (check!).  tx 

Notice  that  Margaret  could  have  appended  the  above  protocol  to  Henrietta  I’s 
letter;  using  it,  a  cheated  wife  always  shoots  her  husband  on  the  midnight  of  the  day 
she  discovers  his  infidelity.  In  fact,  a  slightly  more  elaborate  lower  bound  argument 
of  a  similar  flavor  shows  that  it  is  the  only  protocol  Mary  could  have  appended 
to  Henrietta  I’s  letter  that  is  guaranteed  to  terminate  in  three  nights.  We  remark 
that  by  slightly  charging  steps  (a)  and  (clO)  in  the  above  protocol  it  is  possible  to 
obtain  a  protocol  that  works  correctly  even  if  there  are  no  unfaithful  husbands.  (Of 
course,  in  the  modified  protocol  a  wife  knowing  of  no  unfaithful  husband  will  not 
shoot  her  husband  on  the  first  night,  and  thus  such  a  protocol  cannot  be  appended 
to  Henrietta  I’s  letter.)  Details  are  left  to  the  reader. 
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5.7  Discussion 

The  cheating  husbands  problem  is  one  in  which  communication,  knowledge, 
and  action  interact  in  subtle  ways.  We  have  presented  a  case  analysis  of  variants 
of  this  problem  given  different  communication  mediums  and  different  degrees  of 
clock  synchronization.  This  problem  demonstrates  how  sensitive  the  success  of  an 
operation  can  be  to  the  known  properties  of  the  communication  medium.  It  also 
shows  how  knowledge  can  be  obtained  in  indirect  ways  by  observing  the  actions 
of  elements  in  the  system,  once  we  know  something  about  how  their  actions  are 
related  to  the  facts  they  know.  In  fact,  as  we  see  in  the  bribery  of  the  mailperson 
in  Proposition  5.7,  obtaining  knowledge  about  the  delivery  times  of  a  single  letter 
can  in  some  cases  dramatically  improve  a  wife’s  capability  to  act. 

The  queens’  instructions  in  all  cases  can  be  viewed  as  knowledge-based  protocols 
in  the  sense  of  [HF] .  since  the  actions  that  a  wife  is  required  to  take  depend  on  her 
knowledge.  The  basic  high-level  “knowledge-based”  protocol  that  the  wives  follow 
is: 

Do  not  discuss  the  matter  of  your  husband’s  fidelity  with  anyone.  How¬ 
ever,  should  you  discover  that  your  husband  is  unfaithful,  you  must  shoot  him 
on  the  midnight  of  the  day  you  find  out  about  it. 

Consider  a  scenario  in  which  the  queen’s  letter  reaches  all  of  the  wives  on  the 
day  it  is  sent.  The  actual  way  in  which  the  above  protocol  will  be  carried  out 
(‘  -plemented”)  will  depend  on  the  known  properties  of  the  mail  system.  As  our 
analysis  shows,  the  elimination  of  the  n-j- 1  unfaithful  husbands  may  take  n-f  6  nights, 
it  may  take  nb  nights,  and  it  might  never  happen  at  all,  depending  on  whether  the 
mail  system  is  commonly  known  to  be  strongly  synchronous,  weakly  synchronous, 
or  asynchronous,  and  the  order  of  message  delivery  is  unknown.  Thus,  the  execution 
of  the  protocol  and  its  success  depend  not  only  on  what  actually  happens  (in  this 
case,  all  letters  being  delivered  on  the  same  day);  they  also  crucially  depend  on  the 
wives’  state  of  knowledge  of  what  happens. 

Another  interesting  point  that  arises  here  is  that  additional  knowledge  can  in 
some  cases  be  harmful.  The  results  of  Theorem  5.10  show  that  running  the  same 
knowledge-based  protocol  in  a  situation  where  the  wives  initially  have  strictly  more 
knowledge  (they  know  the  order  of  delivery  of  a  message  broadcast  around  a  ring) 
can  result  in  a  less  desirable  outcome.  The  ignorance  present  when  the  order  of 
delivery  is  unknown  gives  rise  to  states  of  knowledge  that  allow  the  wives  to  all 
successfully  determine  whether  their  husbands  are  faithful. 
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Our  analysis  in  the  previous  chapters  focussed  on  how  uncertainties  in  behavior 
of  the  communication  medium  and  the  relative  readings  of  clocks  affect  the  states  of 
knowledge  attainable  and  the  actions  that  can  be  performed  in  the  system.  In  this 
chapter  we  look  at  how  the  uncertainty  regarding  reliability  of  processors  can  affect 
the  actions  that  can  be  performed  in  the  system.  The  systems  we  consider  in  this 
chapter  have  a  particularly  simple  and  reliable  structure  in  terms  of  clocks  and  com¬ 
munication  mediums,  but  the  processors  in  the  system  are  unreliable.  We  will  thus 
need  to  modify  the  formalism  presented  in  Chapter  2  somewhat  for  the  purposes  of 
this  analysis.  Consequently,  this  chapter  is  to  a  large  extent  self-contained. 

6.1  Introduction 

The  problem  of  designing  effective  protocols  for  distributed  systems  whose  com¬ 
ponents  are  unreliable  is  both  important  and  difficult.  In  general,  a  protocol  for  a 
distributed  system  in  which  all  components  are  liable  to  fail  cannot  uncondition¬ 
ally  guarantee  to  achieve  non-trivial  goals.  In  particular,  if  all  processors  in  the 
system  fail  at  an  early  stage  of  an  execution  of  the  protocol,  then  fairly  little  will 
be  achieved  regardless  of  what  actions  the  protocol  intended  for  the  processors  to 
perform.  However,  such  universal  failures  are  not  very  common  in  practice,  and  we 
are  often  faced  with  the  problem  of  seeking  protocols  that  will  function  correctly 
so  long  as  the  number,  type,  and  pattern  of  failures  during  the  execution  of  the 
protocol  are  reasonably  limited.  A  requirement  that  is  often  made  of  such  protocols 
is  t -resiliency  —  that  they  be  guaranteed  to  achieve  a  particular  goal  so  long  as  no 
more  than  t  processors  fail. 

A  good  example  of  a  desirable  goal  for  a  protocol  in  an  unreliable  system  is  called 
Simultaneous  Byzantine  Agreement  (SBA),  a  variant  of  the  Byzantine  agreement 
problem  introduced  in  [PSL]: 
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Given  are  n  processors,  at  most  t  of  which  might  be  faulty.  Each  processor 

Pi  has  an  initial  value  V{  €  {0, 1}.  Required  is  a  protocol  with  the  following 

properties: 

1.  Every  non-faulty  processor  pi  irreversibly  “decides”  on  a  value  v  (E  {0,1}. 

2.  The  non-faulty  processors  all  decide  on  the  same  value. 

3.  The  non-faulty  processors  all  decide  simultaneously,  i.e.,  in  the  same 
round  of  computation. 

4.  Hall  initial  values  r,  are  identical,  then  all  non-faulty  processors  decide  t 

Throughout  this  chapter  we  will  use  t  to  denote  an  upper  bound  on  the  number 
of  faulty  processors.  We  call  a  distributed  system  whose  processors  are  unreliable 
a  Byzantine  environment. 

The  Byzantine  agreement  problem  embodies  some  of  the  fundamental  issues 
involved  in  the  design  of  effective  protocols  for  unreliable  systems,  and  has  been 
studied  extensively  in  the  literature  (see  {Fis]  for  a  survey).  Interestingly,  although 
many  researchers  have  obtained  a  good  intuition  for  the  Byzantine  agreement  prob¬ 
lem,  many  aspects  of  this  problem  still  seem  to  be  mysterious  in  many  ways,  and 
the  general  rules  underlying  some  of  the  phenomena  related  to  it  are  still  unclear. 

what  facts  can  become  common  knowledge  at  different  points  in  the  execution  of 
a  ^-resilient  protocol.  We  restrict  our  attention  to  systems  in  which  communication 
is  synchronous  and  reliable,  and  the  only  type  of  processor  faults  possible  are  crash 
failures:  a  faulty  processor  might  crash  at  some  point,  after  which  it  sends  no 
messages  at  all.  Despite  the  fact  that  crash  failures  are  relatively  benign,  and 
dealing  with  arbitrary  possibly  malicious  failures  is  often  more  complicated,  work 
on  the  Byzantine  agreement  problem  has  shown  that  many  of  the  difficulties  of 
working  in  a  Byzantine  environment  are  already  exhibited  in  this  model.  In  the 
sequel  we  will  use  SBA  as  our  standard  example  of  a  desirable  simultaneous  action. 

Our  analysis  provides  new  insight  into  the  basic  issues  involved  in  performing 
simultaneous  actions  in  a  Byzantine  environment.  For  example,  it  shows  that  the 
pattern  in  which  failures  occur  completely  determines  the  number  of  rounds  re¬ 
quired  to  attain  common  knowledge  of  facts  about  the  initial  state  of  the  system. 
Consequently,  we  obtain  a  complete  characterization  of  the  patterns  of  failures  that 
require  a  t-resilient  protocol  for  SBA  to  take  k  rounds,  for  2  <  k  <  f  4-  1.  This 
generalizes  the  well-known  fact  that  SBA  requires  t  +  1  rounds  in  the  worst  case 
(cf.  [DLM],[DS],[CD],[FLj,[H],[LFj).  Our  proof  is  a  simplification  of  the  well-known 
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lower  bound  proof  for  SBA.  Interestingly,  our  analysis  immediately  suggests  a  pro¬ 
tocol  for  SBA  that  is  optimal  in  all  runs.  That  is,  it  halts  as  early  as  possible,  given 
the  pattern  in  which  failures  occur.  In  many  cases  this  turns  out  to  be  much  earlier 
than  in  any  protocol  previously  known.  This  is  the  first  protocol  for  SBA  that  is 
optimal  in  all  runs.  In  fact,  it  is  the  first  protocol  for  SBA  that  ever  halts  before 
the  end  of  round  t  +  1 .  The  t  +  1  round  lower  bound  on  the  worst  case  behavior  of 
protocols  for  SBA  has  often  been  misinterpreted  to  mean  that  SBA  cannot  ever  be 
reached  in  less  than  t  ■ f  1  rounds. 

The  analysis  presented  in  this  chapter  applies  to  a  large  class  of  simultaneous 
actions,  not  only  to  SBA.  For  example,  we  present  the  bivalent  agreement  problem, 
in  which  clause  (4)  of  SBA  is  replaced  by  a  requirement  that  the  protocol  have 
at  least  one  run  in  which  the  processors  decide  0,  and  at  least  one  run  in  which 
they  decide  1.  We  derive  a  protocol  that  always  reaches  bivalent  agreement  in  two 
rounds.  This  contradicts  a  “folk  conjecture”  in  the  field  that  states  that  performing 
any  non-trivial  task  simultaneously  in  a  byzantine  environment  requires  t  + 1  rounds 
in  the  worst  case. 

The  main  contribution  of  this  chapter  is  to  illustrate  how  a  knowledge-based 
analysis  of  protocols  in  a  Byzantine  environment  can  provide  insight  into  the  fun¬ 
damental  properties  of  such  systems.  This  insight  is  very  useful  in  the  design  of 
improved  t-resilient  protocols  for  Byzantine  agreement  and  many  related  problems. 
The  analysis  also  provides  some  insight  into  how  assumptions  about  the  reliability 
of  the  system  affect  the  states  of  knowledge  attainable  in  the  system.  We  briefly 
consider  some  other  reliability  assumptions  and  apply  our  analysis  to  them. 

Section  6.2  contains  the  basic  definitions  and  some  of  the  fundamental  properties 
of  our  model  of  a  distributed  system  and  of  knowledge  in  a  distributed  system. 
Section  6.3  investigates  the  states  of  knowledge  attainable  in  a  particular  fairly 
general  protocol.  Section  6.4  contains  an  analysis  of  the  lower  bounds  corresponding 
to  the  analysis  of  Section  6.3,  simplifying  and  generalizing  the  well-known  #+1  round 
worst-case  lower  bound  for  reaching  SBA.  Section  6.5  discusses  some  applications  of 
our  analysis  to  problems  related  to  SBA,  and  Section  6.6  includes  some  concluding 
remarks. 
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6.2  Definitions  and  preliminary  results 

In  this  section  we  present  a  number  of  basic  definitions  that  will  be  used  in 
the  rest  of  the  chapter,  and  discuss  some  of  their  implications.  Our  treatment 
will  generally  follow  along  the  lines  of  Chapter  2,  simplified  and  modified  for  our 
purposes. 

We  consider  a  synchronous  distributed  system  consisting  of  a  finite  collection 
of  n  >  2  processors  (automata)  {pi,pa, . . .  ,pn},  each  pair  of  which  is  connected  by 
a  two-way  communication  link.  The  processors  share  a  discrete  global  clock  that 
starts  out  at  time  0  and  advances  by  increments  of  one.  Communication  in  the 
system  proceeds  in  a  sequence  of  rounds,  with  round  k  taking  place  between  time 
Ar  —  1  and  time  k.  In  each  round,  every  processor  first  sends  the  messages  it  needs 
to  send  to  other  processors,  and  then  it  receives  the  messages  that  were  sent  to  it 
by  other  processors  in  the  same  round.  The  identity  of  the  sender  and  destination 
of  each  message,  as  well  as  the  round  in  which  it  is  sent,  are  assumed  to  be  part  of 
the  message.  At  any  given  time,  a  processor’s  message  history  consists  of  the  set 
of  messages  it  has  sent  and  received.  Every  processor  p  starts  out  with  some  initial 
state  a.  A  processor’s  view  at  any  given  time  consists  of  its  initial  state,  message 
history,  and  the  time  on  the  global  clock.  We  think  of  the  processors  as  following  a 
protocol ,  which  specifies  exactly  what  messages  each  processor  is  required  to  send 
(and  what  other  actions  the  processor  should  take)  at  each  round,  as  a  deterministic 
function  of  the  processor’s  view.  However,  a  processor  might  be  faulty,  in  which 
case  it  might  commit  a  stopping  failure  at  an  arbitrary  round  k  >  0.  If  a  processor 
commits  a  stopping  failure  at  round  k  (or  simply  fails  at  round  k),  then  it  obeys  its 
protocol  in  all  rounds  preceding  round  k,  it  does  not  send  any  messages  in  the  rounds 
following  k,  and  in  round  k  it  sends  an  arbitrary  (not  necessarily  strict)  subset  of 
the  messages  it  is  required  by  its  protocol  to  send.  (Since  a  failed  processor  sends 
no  further  messages,  we  need  not  make  any  assumptions  regarding  what  messages  it 
receives  in  its  failing  round  and  in  later  rounds.)  For  technical  reasons,  we  assume 
that  once  a  processor  fails,  its  view  becomes  a  distinguished  failed  view.  The  set 
A  of  active  processors  at  time  k  consists  of  all  of  the  processors  that  did  not  fail  in 
the  first  k  rounds. 

A  run  r  of  such  a  system  is  a  complete  history  of  its  behavior,  from  time  0  until 
the  end  of  time.  This  includes  each  processor’s  initial  state,  message  history,  and,  if 
the  processor  fails,  the  round  in  which  it  fails.  An  execution  (sometimes  also  called 
a  point )  is  a  pair  (r,  k ),  where  r  is  a  run  and  k  is  a  natural  number.  We  will  use  (r,  k) 
to  refer  to  the  state  of  r  after  its  first  k  rounds.  Two  executions  (r,  k)  and  ( r',k ) 
will  be  considered  equal  if  all  processors  start  in  the  same  initial  states  and  display 
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the  same  behavior  in  the  first  k  rounds  of  r  and  r' .  The  list  of  the  processors’  initial 
states  is  called  the  system’s  initial  configuration.  We  denote  processor  p’s  view  at 
(r,  k)  by  u(p,  r,  k).  Furthermore,  we  will  sometimes  parameterize  the  set  A  of  active 
processors  by  the  particular  execution,  denoted  A(r,  k). 

We  will  find  it  useful  to  talk  about  the  pattern  in  which  failures  occur  in  a  given 
run.  Formally,  a  failure  pattern  n  is  a  set  of  triples  of  the  form  (p,  fc(p),  Q(p)), 
where  p  is  a  processor,  k(p)  is  a  round  number,  and  Q(p )  is  a  set  of  processors.  A 
run  r  displays  (or,  more  precisely,  is  consistent  with )  the  failure  pattern  n  if  every 
processor  that  fails  in  r  is  the  first  element  of  some  triple  in  7r,  and  for  every  triple 
(p,  k(p),  Q(p ))  in  7r  it  is  the  case  that  processor  p  fails  in  round  k(p)  of  r,  and  in 
round  k(p )  it  sends  no  messages  the  processors  in  Q(p),  and  does  send  messages  to 
all  processors  not  in  Q(p )  to  which  the  protocol  prescribes  it  to  send.  A  protocol  V, 
initial  configuration  <r,  and  failure  pattern  it  uniquely  determine  a  run.  (However,  a 
nm  of  the  protocol  may  be  the  result  of  more  than  one  failure  pattern  in  protocols 
that  don’t  require  all  processors  to  send  messages  to  all  other  processors  in  every 
round.)  We  denote  this  run  by  V(cr ,  re). 

Following  Chapter  2,  we  identify  a  distributed  system  with  the  set  5  of  the 
possible  runs  of  a  particular  fixed  protocol  V  =  (P(l), . . . ,  P(n)),  where  P(i)  is  the 
part  of  the  protocol  followed  by  processor  p<.  This  set  essentially  encodes  all  of  the 
relevant  information  about  the  execution  of  the  protocol  in  the  system.  Given  a 
system  S,  for  1  <  t  <  n  let  Sj  be  the  set  of  initial  states  that  processor  p,  assumes 
in  the  runs  of  S.  The  system  S  is  said  to  be  a  t -uniform  system  for  V  if  there  is  a 
set  of  initial  configurations  T  C  £i  x  •  •  •  x  En  such  that  S  is  the  set  of  all  runs  of  the 
protocol  V  starting  in  initial  configurations  from  T  in  which  at  most  t  processors 
fail,  t-uniform  systems  have  the  property  that  a  processor  failure  is  an  event  that 
is  independent  of  the  initial  configuration  and  of  the  time  in  which  other  processors 
fail.  A  system  is  said  to  be  independent  if  its  set  of  initial  configurations  is  of  the 
form  Ej  x  •••  x  Sn.  In  an  independent  <- uniform  system  there  is  no  necessary 
dependence  between  the  initial  states  of  the  different  processors.  The  properties  of 
t-resilient  protocols  can  be  studied  by  analyzing  particular  t-uniform  systems  for 
them.  For  example,  a  given  protocol  is  a  resilient  protocol  for  SBA  if  all  runs  of 
the  independent  t-uniform  system  in  which  the  set  of  possible  initial  configurations 
is  {0,  l}n  satisfy  the  requirements  of  SBA. 

We  assume  the  existence  of  an  underlying  logical  language  for  representing 
ground  facts  about  the  system.  By  ground  we  mean  facts  about  the  state  of  the 
system  that  do  not  explicitly  mention  processors’  knowledge.  Formally,  a  ground 
fact  <p  will  be  identified  with  a  set  of  executions  r(<p)  C  S  x  IV,  where  N  is  the  set  of 
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natural  numbers.  Given  a  run  r  £  S  of  the  system  and  a  time  k,  we  will  say  that  p 
holds  at  (r,  fc),  denoted  (5,  r,  k)  |=  p,  iff  (r,  k)  £  r(<p).  We  will  define  various  ground 
facts  as  we  go  along.  The  set  of  executions  corresponding  to  these  facts  will  be  clear 
from  the  context.  We  close  this  language  under  the  standard  boolean  connectives 
A ,  and  D  ,  interpreted  as  the  standard  conjunction,  negation  and  implication. 

Given  a  system  S,  we  now  define  what  facts  a  processor  is  said  to  “know”  at 
any  given  point  (r,  k )  for  r  £  S.  Roughly  speaking,  p,  is  said  to  know  a  fact  ip  if 
ip  is  guaranteed  to  hold,  given  pi ’s  view  of  the  run.  More  formally,  given  a  system 
S,  we  say  that  two  points  (r,  k)  and  (r',  k’)  are  p,- equivalent  relative  to  S,  denoted 
(r,  k)  A  (r',Jfc'),  iff  r,r'  £  S  and  o(pi,r,k)  =  v(pi,r',k').  (The  only  case  in  which 
v(pi,r,  k)  =  v(pi,r' ,  k')  is  possible  for  it'  ^  k  is  when  u(p,,  r,  k)  =  failed.)  We 
say  that  a  processor  pi  knows  a  fact  ip  in  S  at  (r,  k),  denoted  ( S,r,k )  (=  Kiip ,  if 
(5, r\k')  ip  for  all  executions  ( r',k ')  £  S  x  N  satisfying  (r,  k)  A  ( r',k ').  This 
definition  of  knowledge  is  essentially  the  total  view  interpretation  of  Chapter  2.  We 
are  about  to  review  some  of  the  properties  of  knowledge  under  this  definition.  Other 
properties  will  be  covered  in  the  sequel  (see  also  Chapter  2  and  [HM]). 

A  formula  is  said  to  be  valid  if  it  is  true  of  all  executions  in  all  systems.  Given 
a  system  S,  a  formula  is  said  to  be  valid  in  S  if  it  true  of  all  executions  of  S.  It 
follows  that  a  valid  fact  is  valid  in  5  for  all  systems  S.  We  now  show  that  under  our 
definition  of  knowledge,  K{  satisfies  the  axioms  of  the  modal  system  S5.  This  fact 
will  follow  in  a  straightforward  way  from  the  fact  that  knowledge  is  determined  by 
the  A  relations,  which  in  our  case  are  equivalence  relations. 

Proposition  6.1: 

a)  If  ip  is  valid  in  S  then  Ki<p  is  valid  in  S. 

b)  The  consequence  closure  axiom  is  valid: 

CONSEQUENCE  CLOSURE:  (Ki<p  A  Ki(<p  D  ip))  D  Kiip. 


c)  The  knowledge  axiom  is  valid: 

KNOWLEDGE  AXIOM:  Kpp  D  <p. 


d)  The  positive  introspection  axiom  is  valid: 


POSITIVE  INTROSPECTION: 


Kip  D  KiKip. 
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e)  The  negative  introspection  axiom  is  valid: 

NEGATIVE  INTROSPECTION:  ~>Knp  D  Ki-'Kpp. 

Proof:  For  part  (a),  let  (r,  fc)  be  an  (arbitrarily  chosen)  execution  satisfying  r  £ 
S,  and  let  <p  be  a  formula  that  is  valid  in  S.  Thus  <-p  is  true  of  all  executions 
( r',k ')  £  S  x  N,  and,  in  particular,  < p  is  true  of  all  executions  in  S  x  N  that  are 
Pi-equivalent  to  (r,  fc).  It  thus  follows  that  Kup  is  true  of  (r,  fc),  and  since  (r,  k) 
was  an  arbitrary  execution  in  S  x  N,  we  have  that  Kup  is  valid  in  S.  For  (b),  let 
(S,r,  k)  [=  Kup  A  Ki(<p  D  ip).  Then  by  the  definition  of  ( S,r,  fc )  f=  Kup  we  have 
that  both  <p  and  (<p  D  ip)  hold  at  all  points  ( r',fc )  that  are  pi-equivalent  to  (r,  fc). 
It  thus  follows  that  ip  holds  at  all  such  points  (r',fc),  and  again  by  the  definition 
of  (S,  r,  k)  |=  ip  we  are  done.  Part  (c)  follows  from  the  fact  that  (r,  k)  A  (r,  fc), 
i.e.,  pi -equivalence  is  reflexive.  Now,  by  definition  we  have  that  if  Kup  is  true 
of  (r,  k)  then  <p  is  true  of  all  executions  that  are  pj-equivalent  to  (r,  fc),  and  in 
particular  ip  is  true  of  (r,  fc).  For  part  (d),  let  (S,r,  fc)  f=  K,ip.  Thus,  <p  is  true 
of  all  executions  (r",fc")  A  (r,  fc).  We  wish  to  show  that  (S, r',  fc')  [=  Kup  for  all 
(r',  fc')  A  (r,  fc).  Since  A  is  an  equivalence  relation,  all  executions  (r",  fc")  £  9  x  N 
satisfy  that  (r, fc)  A  (r", fc")  iff  (r',fc')  A  (r",fc").  It  thus  follows  that  <p  is  true 
of  all  executions  (r",fc")  A  (r',fc'),  and  we  are  done.  The  argument  for  part  (e) 
is  similar.  If  (5,  r,  fc)  |=  -up  then  ( S ,  r,  fc)  $4  tp,  and  therefore  there  must  be  an 
execution  (r",  fc")  that  is  p.-equivalent  to  (r,  fc)  of  which  <p  is  not  true.  Let  (r',  fc')  be 
an  execution  that  is  p,- -equivalent  to  (r,  fc).  Because  p,-equi valence  is  an  equivalence 
relation,  we  have  that  (r',fc')  A  (r",fc"),  and  hence  (5,r',fc')  f=  -'Kup.  It  now 
follows  that  (S,r,  fc)  (=  Ki~>Ki<p  and  we  are  done.  m 

Roughly  speaking,  clauses  (a)  through  (e)  characterize  the  modal  system  S5.  An 
operator  satisfying  clauses  (a)  through  (d)  is  said  to  satisfy  the  modal  system  S4  (cf. 
[HM]).  An  interesting  consequence  of  our  choice  of  having  a  failed  processor’s  view 
be  a  distinguished  failed  view  is  the  fact  that  a  processor  always  knows  whether 
it  is  active.  Furthermore,  the  only  things  that  a  failed  processor  knows  are  the 
consequences  of  the  fact  that  the  processor  has  failed  and  of  the  formulas  that  are 
valid  in  S.  Given  that  a  failed  processor  is  “out  of  the  game”  in  our  model,  we  will 
focus  our  attention  on  the  knowledge  of  the  active  processors. 

Having  defined  knowledge  for  individual  processors,  we  now  extend  this  defini¬ 
tion  to  states  of  group  knowledge.  Given  a  group  G  C  {p,, . . .  ,pn},  we  first  define 
G’s  view  at  (r,  fc),  denoted  v(G,r,k): 

u(G,r,fc)  =f  {{p,t>(p,r,  fc))  :  p  £  G} 
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Thus,  roughly  speaking,  G’s  view  is  simply  the  joint  view  of  its  members.  Extending 
our  definition  for  individuals’  knowledge,  we  say  that  the  group  G  has  implicit 
knowledge  of  p  at  (r,  k),  denoted  (S,  r,  k)  f=  IGp,  if  for  all  runs  r'  €  S  satisfying 
v(G,r,k )  =  v(G,r',k)  it  is  the  case  that  (S, r',k)  p.  Intuitively,  G  has  implicit 
knowledge  of  p  if  the  joint  view  of  G’s  members  guarantees  that  p  holds.  Notice 
that  if  processor  p  knows  p  and  processor  q  knows  p  D  ip,  then  together  they  have 
implicit  knowledge  of  ip,  even  if  neither  of  them  knows  ip  individually.  An  identical 
proof  to  that  of  Proposition  G.l  now  shows: 

Proposition  6.2:  The  operator  Ia  satisfies  the  modal  system  S5  (clauses  (a) 
througn  (e)  of  Proposition  6.1,  substituting  Ia  for  Kt).  txj 

We  refer  the  reader  to  Chapter  2  and  [HM]  for  a  discussion  and  a  formal  treat¬ 
ment  of  Ia-  In  this  chapter  we  are  mainly  interested  in  states  of  knowledge  of  the 
group  A  of  active  processors.  We  say  that  the  set  of  active  processors  implicitly 
knows  p,  denoted  Ip,  exactly  if  Iap  holds  for  the  set  G  =  A.  Stated  more  formally, 

(5,  r,  k )  (=  Ip  iff  (S,  r,  k)  Iap  for  G  =  A(r,  k ). 

Although  Ip  is  defined  in  terms  of  IGp,  it  is  not  the  case  that  I  and  IG  have  the 
same  properties.  The  reason  for  this  is  that  whereas  G  is  a  fixed  set,  membership 
in  A  may  vary  over  time  and  differs  from  one  run  to  another.  Thus,  for  example, 
it  is  often  the  case  that  for  G  =  A(r,k)  we  have  (5,  r,  k)  \/k  Ia(A  =  G),  because 
there  is  some  run  r'  (E  S  such  that  v(G,  r,  k)  =  v(G,  r',k)  and  where  G  is  a  strict 
subset  of  A(r',k).  Consequently,  whereas  the  negative  introspection  axiom  for  Ia, 
i.e.,  -'IGp  D  /q-’/gV5*  is  valid,  the  corresponding  formula  for  I:  -> Ip  D  I~<Ip ,  is 
not  valid!  (Notice,  however,  that  I(G  C  A)  holds  whenever  GCA).  For  example, 
it  may  be  the  case  that  processor  pj  sends  processor  pi  a  message  in  round  1  stating 
Pj's  initial  state,  and  fails  before  sending  any  other  message,  and  that  processor  p, 
fails  in  round  1  after  sending  all  of  its  round  1  messages.  Processor  pj' s  initial  state 
is  thus  not  implicitly  known  to  the  set  of  active  processors,  but  it  is  consistent  with 
the  active  processors’  joint  view  that  p*  is  active,  in  which  case  pj's  initial  state 
would  be  implicitly  known.  The  above  discussion  can  be  summarized  by: 

Proposition  6.3:  The  implicit  knowledge  operator  I  satisfies  the  modal  system 
S4  (i.e.,  clauses  (a)  —  (d)  of  Proposition  6.1).  The  negative  introspection  axiom  is 
not  valid  for  I. 
x 


The  foil  ng  lemma  describes  the  relationship  between  Kt  and  /: 


SECT.  6.2 


DEFINITIONS  AND  PRELIMINARY  RESULTS  71 


Lemma  6.4:  Let  y  be  a  formula  and  let  pi  £  A(r,  k). 

a)  If  (5,  r,  k )  )=  Kip  then  (5,  r,  k)  j=  Ip. 

b)  If  (S,  r,  k )  f=  Ktp  then  (S,  r,  k)  (=  KJp. 

Proof:  For  part  (a),  assume  that  (S,  r,  k )  Kip,  and  let  (r',  k')  be  an  execution 
satisfying  v(A(r,  k),  r' ,  k')  =  v(A(r,k),r,k).  In  particular,  since  pi  £  A(r,k )  we 
have  that  u(p,,  r',  k')  =  v(pi,  r,  k ),  and  thus  since  Kip  holds  at  (r,  k ),  we  have  that 
p  holds  at  (r',  k').  Since  this  is  true  for  all  such  executions  (r',  k'),  we  are  done  by 
the  definition  of  ( S,r,k )  Ip.  For  (b),  let  (S,  r,  k)  Kip.  By  Proposition  6.1(d) 
we  have  that  (S,r,  k)  |=  KlKlp.  The  fact  that  p,  £  A(r,  k )  implies  that  u(pi,r,  fc)  ^ 
failed.  Thus,  p<  is  an  active  processor  in  all  executions  that  are  p;- equivalent  to 
(r,  k ).  Let  (r',  k ')  (r,  k).  We  thus  have  that  p,  6  A(r'  ,k'),  and  that  Kip  holds  at 

(r',  k').  Part  (a)  therefore  implies  that  Ip  holds  at  (r',  k'),  and  thus  Kilp  holds  at 
(r,  k).  txi 

We  now  show  that,  roughly  speaking,  in  t-uniform  systems  once  a  fact  about  the 
past  is  not  implicitly  known  it  is  lost  forever;  it  will  not  become  implicit  knowledge 
at  a  later  time.  We  say  that  a  fact  ip  is  about  the  first  k  rounds  if  for  all  runs  r  £  S 
it  is  the  case  that  (S,r,k)  ip  iff  (S,  r,  £)  f=  ip  for  all  I  >  k.  In  particular,  facts 
about  the  first  0  rounds  are  facts  about  the  initial  configuration.  We  now  have: 

Theorem  6.5:  Let  5  be  a  t-uniform  system,  let  ip  be  a  fact  about  the  first  k 
rounds,  and  let  £  >  k.  If  (5,  r,  k )  ^  lip  then  (5,  r,  £)  lip. 

Proof:  Let  £  >  k,  and  let  r  and  ip  be  such  that  xp  is  about  the  first  k  rounds 
and  (5,  r,  k )  )f=  lip.  Let  G  =  A(r,  k).  It  follows  that  there  exists  a  run  r'  £  S  such 
that  v(G,  r,  k)  =  v(G,r',k),  and  (5,  r',k)  \f=  ip.  Let  r"  be  a  run  with  the  following 
properties:  (i)  (r" ,k)  —  (r',  &);  (ii)  all  processors  in  A(r' ,  k)  —  G  fail  in  round  k  +  1 
of  r"  before  sending  any  messages;  and  (iii)  from  round  k  +  1  on  all  processors  in 
G  behave  in  r"  exactly  as  they  do  in  r.  By  construction,  the  number  of  processors 
that  fail  by  time  k  in  r"  is  no  larger  than  the  number  in  r,  and  exactly  the  same 
processors  fail  in  r  and  in  r"  by  any  later  time.  Given  that  5  is  a  t-uniform  system 
and  r  £  S,  no  more  than  t  processors  fail  in  r.  It  follows  that  r"  £  5,  since  all  of  the 
processors  follow  the  same  protocol  in  r"  and  ir  r,  and  no  more  than  t  processors  fail 
in  r" .  By  construction  of  r"  we  also  have  that  A{r"  ,£)  =  j4(r,  £)  and  that  the  active 
processors  have  identical  views  in  (r",£)  and  in  (r, £).  It  follows  that  (S, r" ,£)  \=  lip 
iff  (5,  r,  £)  )=  lip.  Since  ip  is  a  fact  about  the  first  k  rounds  and  (r",  k )  =  (r',  k),  we 
have  that  (5,r",£)  ^  ip  because  (5,r',Ar)  ^  ip.  Thus,  in  particular,  (5,r",£)  lip 
and  it  follows  that  (5,  r,  £)  ^  I  ip  and  we  are  done.  cxj 


72  SYSTEMS  OF  UNRELIABLE  PROCESSORS 


CHAP.  6 


Fagin  and  Vardi  perform  an  interesting  analysis  of  implicit  knowledge  in  reliable 
systems  (cf.  [FV]).  Among  other  things,  they  prove  that  the  set  of  facts  that  are 
implicit  knowledge  about  the  initial  configuration  does  not  change  with  time.  I.e., 
in  reliable  systems  the  implication  in  the  statement  of  the  Theorem  6.5  becomes 
an  equivalence.  However,  in  t- uniform  Byzantine  systems  it  is  clearly  the  case  that 
implicit  knowledge  can  be  “lost”.  For  example,  if  processor  p,  may  start  in  initial 
states  a  and  cr\  and  in  a  particular  run  of  the  system  pi  starts  in  state  a  and  fails  in 
the  first  round  before  sending  any  messages,  then  whereas  /(“pi  started  in  state  cr”) 
holds  at  time  0,  it  does  not  hold  at  any  later  time. 

We  now  introduce  the  two  other  states  of  group  knowledge  that  are  central  to 
our  analysis.  We  define  “everyone  knows”  and  “common  knowledge”  along  the  lines 
of  Chapter  2.  In  our  case,  however,  these  notions  will  be  defined  for  the  set  of  active 
processors.  Every  ( active )  processor  knows  <p,  denoted  Eip,  is  defined  by 

E<p  =f  f\  (pit  A  D  Kup). 

l<i<n 

An  immediate  corollary  of  Lemma  6.4  which  we  will  find  useful  in  the  sequel  is: 

Corollary  6.6:  Ey>  D  E(Lp)  is  valid, 
tx 

We  define  J E?x<p  =f  Eip,  and  Em+1tp  ==f  E(Em(p)  for  m  >  1.  A  fact  is  said  to 
be  common  knowledge  among  the  active  processors ,  denoted  C<p,  if  Em holds  for 
all  m  >  1.  More  formally, 

Cip  =  <p  A  Eip  A  E2<p  A  •••  A  Em<p  A  •••. 

Common  knowledge  among  the  active  processors,  which  we  will  call  simply  com¬ 
mon  knowledge,  will  play  a  crucial  role  in  the  sequel.  We  now  study  some  of  its 
properties.  A  useful  tool  for  thinking  about  Emip  and  C<p  is  the  labelled  undirected 
graph  whose  nodes  are  the  executions  of  a  system  S,  and  whose  edges  are  the  A 
relations,  restricted  so  that  an  edge  ene'  exists  only  if  pi  is  active  in  e  (and  hence 
also  in  e').  (This  graph  is  precisely  the  Kripke  structure  modelling  the  active  pro¬ 
cessors’  knowledge  in  the  system;  cf.  [HM].)  The  distance  between  two  executions 
e  =  (r,  k )  and  e'  =  (r',  k)  in  this  graph,  denoted  6(e,e'),  is  simply  the  length  of  the 
shortest  path  in  the  graph  connecting  e  and  e'.  If  there  is  no  path  connecting  e 
to  e',  then  £(e,e')  is  defined  to  be  infinity.  Two  executions  e  and  e'  are  said  to  be 
similar ,  denoted  e  ~  e'  if  6(e,  e')  is  finite  (i.e.,  if  e'  and  e  are  in  the  same  connected 
component  of  the  graph).  Equivalently,  (r,  k )  ~  (r',  k),  if  for  some  finite  m  there  are 
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runs  r! ,  r2 , . . . ,  rm_!  G  S,  and  processors  pit  ,p,a , . . . ,  pim ,  satisfying  pti  €  A(rj ,  k) 
for  j  <  m  -  1,  pim  G  A(r',  k),  and 

(r,fc)  &  (rt,fc)  &  •••  <-»  (rm-i,  k)  &  (r',fc). 

(The  system  5  is  usually  clear  from  context,  and  thus  we  do  not  add  a  subscript  S 
to  ~.)  It  is  now  easy  to  check  that  (S',  r,  k)  Ep  iff  (5,  r',  k )  <p  for  all  executions 
(r',  k )  of  distance  <  1  from  (r,  fc).  Notice  that  similarity  is  an  equivalence  relation. 
We  can  now  show: 

Proposition  6.7: 

a)  (S,  r,  k)  Cp  iff  (S,  r',k)  \=  p  for  all  r'  G  S  such  that  (r,  k )  ~  {r',k). 

b)  If  is  valid  in  S  then  Cp  is  valid  in  S. 

c)  C  satisfies  the  axioms  of  the  modal  system  S5  (see  Proposition  6.1). 

d)  The  induction  axiom  is  valid: 

INDUCTION  AXIOM:  C(<p  D  Eip)  D  (ip  D  C<p). 

e)  The  fixpoint  axiom  is  valid: 

FIXPOINT  AXIOM:  C<p  D  <p  A  EC<p. 

Proof:  (a)  follows  by  a  straightforward  induction  on  m  showing  that  (5,  r,  k) 
Em<p  iff  (5,  r',k)  \=  tp  for  all  ( r',k )  of  distance  <  m  from  (r,  k).  Part  (b)  follows 
directly  from  (a).  The  proof  of  part  (c)  is  identical  to  the  proof  of  Proposition  6.1, 
substituting  C  for  K{  and  ~  for  A.  For  (d),  assume  that  both  p  and  C(p  D  Ep) 
hold  at  e  =  ( r,k ).  We  prove  by  induction  on  m  that  p  holds  at  all  points  of  distance 
<  m  from  e.  The  case  m  =  0  follows  from  our  initial  assumption.  Assume  that 
the  claim  holds  for  m,  and  let  e'  be  a  point  satisfying  S(e,e')  =  m  +  1.  It  follows 
that  there  is  a  point  e"  such  that  6(e,c")  =  m  and  6(e",e')  =  1.  By  the  inductive 
hypothesis  p  holds  at  e".  Since  C(p  D  Ep)  holds  at  e  and  e  ~  e",  part  (a)  implies 
that  p  D  Ep  holds  at  e".  It  follows  that  Ep  holds  at  e",  and  since  S(e",  e')  =  1,  we 
have  that  p  holds  at  e'.  By  induction  we  have  that  p  holds  at  all  points  reachable 
from  (i.e.,  similar  to)  e,  and  by  part  (a)  we  have  that  Cp  holds  at  e,  and  we  are 
done.  For  part  (e),  the  validity  of  Cp  Dp  is  immediate.  By  part  (c)  we  have 
that  C  satisfies  the  positive  introspecton  axiom,  and  hence  Cp  D  CCp  is  valid. 
By  definition  of  Cr/>  we  have  that  Cxp  D  Erfr  is  valid,  and  taking  =  Cp,  we  thus 
have  that  Cp  D  CCp  D  ECp  is  valid,  and  we  are  done.  txi 
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Proposition  6.7  is  very  useful  in  relating  common  knowledge  and  actions  that 
are  guaranteed  to  be  performed  simultaneously.  For  example,  we  can  use  Propo¬ 
sition  6.7(b)  and  the  induction  axiom  in  order  to  relate  the  ability  or  inability  to 
attain  common  knowledge  of  certain  facts  with  the  possibility  or  impossibility  of 
reaching  simultaneous  Byzantine  agreement.  We  model  a  processor’s  “deciding  vn 
by  the  processor  sending  the  message  “the  decision  value  is  i>”  to  itself,  and  have: 

Corollary  6.8:  Let  5  be  a  system  in  which  the  processors  follow  a  protocol  for 
SBA.  If  the  active  processors  decide  on  a  value  v  at  (r,  &),  then 

a)  (5,  r,  k)  (=  C(“AU  processors  are  deciding  u”),  and 

b)  (5, r,  k )  f=  C(“At  least  one  processor  had  v  as  its  initial  value”). 

Proof:  Let  <f  be  the  fact  “all  processors  are  deciding  v”.  Given  that  the  protocol 
guarantees  that  SBA  is  attained  in  S,  it  is  the  case  that  whenever  some  processor 
decides  v  all  active  processors  do,  and  thus  the  formula  if  D  Eip  is  valid  in  5.  Thus, 
by  Proposition  6.7(b)  C(<f  D  Eip)  is  also  valid  in  S.  The  induction  axiom  states 
that  C(ip  D  Eip)  D  (<f  D  C<f).  Combining  these  two  facts  we  have  that  if  D  Cif 
is  valid  in  S,  and  thus  if  ( S,r,k )  |=  if  then  ( S,r,k )  (=  Ctf  and  we  are  done  with 
part  (a).  For  (b),  let  ip  be  “at  least  one  processor  had  v  as  its  initial  value”,  and 
notice  SBA  guarantees  that  if  D  ip  is  valid  in  S.  Thus,  by  Proposition  6.7(b),  so  is 
C(if  D  ip).  The  consequence  closure  axiom  states  that  (Cif  A  C(ip  D  ip))  D  Cip 
is  valid,  and  we  conclude  that  Cif  D  Cip  is  valid  in  S.  By  part  (a)  we  have  that 
(S,r,  k)  if  implies  that  (S,r,  k)  )=  C(<f),  from  which  we  can  now  conclude  that 
(S,  r,  k)  Cip  and  we  are  done.  txi 

The  reasoning  used  in  proof  of  Corollary  6.8  is  typical  of  the  way  Proposi¬ 
tion  6.7(a)  and  (b)  together  with  the  consequence  closure  and  induction  axioms  are 
used  to  prove  that  certain  facts  axe  common  knowledge.  We  will  use  such  reasoning 
again  in  later  proofs. 
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6.3  Analysis  of  a  simple  protocol 

In  this  section  we  take  a  close  look  at  t-uniform  systems  S-p  in  which  all  proces¬ 
sors  follow  a  simple  and  fairly  general  protocol  V\ 

For  k  >  0,  in  round  k  4- 1  each  processor  sends  its  view  at  time  k 

(i.e.,  after  k  rounds)  to  all  other  processors. 

This  protocol  was  named  the  maximal  information  protocol  by  Hadzilacos  (cf. 
[H]).  We  are  interested  in  determining  what  facts  about  the  run  become  common 
knowledge  at  the  different  stages  of  the  execution  of  this  protocol.  Intuitively,  the 
protocol  V  should  provide  the  processors  with  “as  much  knowledge  as  possible” 
about  the  initial  configuration  and  the  pattern  of  failures,  and  should  facilitate  the 
ability  of  the  system  to  perform  actions  that  depend  on  the  initial  configuration. 
One  of  the  relevant  properties  of  this  protocol  is  that  it  requires  every  processor  to 
send  messages  to  all  other  processors  in  every  round.  This  ensures  that  the  failure 
of  a  processor  will  be  known  to  all  processors  at  most  one  round  after  the  round  in 
which  the  processor  fails. 

A  fact  <p  is  called  stable  if  once  it  becomes  true  it  remains  true  forever  (cf. 
Chapter  2).  For  example,  facts  about  the  first  k  rounds,  and  in  particular  facts 
about  the  system’s  initial  configuration,  are  stable.  Since  a  processor’s  knowledge 
is  based  on  the  processor’s  view,  and  an  active  processor’s  view  grows  monotonically 
with  time,  it  is  the  case  that  if  p>  is  stable  then  (as  long  as  at  least  one  processor 
remains  active)  so  are  E<p  and  C<p.  As  we  have  seen,  lip  is  not  necessarily  stable. 

A  round  in  which  no  processor  fails  is  called  a  clean  round.  A  round  that  is  not 
clean  is  called  dirty.  If  every  processor  that  fails  in  round  k  fails  to  send  round  k 
messages  only  to  processors  that  are  not  active  at  time  k,  then  round  k  is  said 
to  be  seemingly  clean.  Notice  that  a  clean  round  is  in  particular  seemingly  clean, 
and  the  active  processors  cannot  distinguish  between  a  clean  round  and  a  seemngly 
clean  round.  The  reason  we  are  interested  in  seemingly  clean  rounds  is  that  if,  for 
some  k ,  round  k  of  a  rim  in  which  the  processors  all  follow  V  is  seemingly  clean, 
then  every  active  processor’s  view  at  the  end  of  round  k  includes  the  view  of  the 
active  processors  at  time  k  —  1.  In  particular  it  follows  that  any  stable  fact  that  is 
implicit  knowledge  at  time  A:  —  1  is  known  to  everyone  at  time  k.  Consequently,  at 
time  k  ail  processors  know  exactly  the  same  facts  about  the  initial  configuration. 
Furthermore,  Theorem  6.5  together  with  the  fact  that  E<p  is  stable  when  <p  is, 
imply  that  at  any  point  sifter  a  clean  round,  sill  of  the  processors  have  identical 
knowledge  about  the  initial  configuration.  Therefore,  once  it  is  common  knowledge 
that  there  was  a  clean  round,  it  is  common  knowledge  that  the  processors  have  an 
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identical  view  of  the  initial  configuration.  The  above  discussion  is  made  precise  by 
the  foil  owing  theorem: 

Theorem  6.9:  Assume  that  t  <  n  —  1. 

a)  Let  p  be  a  stable  fact  such  that  (Sp,  r,  k  —  1)  (=  Ip. 

If  round  k  of  r  is  seemingly  clean  then  (Sp ,  r,  k )  Ep. 

b)  Let  p  be  a  fact  about  the  initial  configuration. 

If  ( Sp,r,£ )  (=  C(“a  seemingly  clean  round  has  occurred”)  then  (Sp,r,£)  |=  Ip 
iff  (Sp,r,£)  Cp. 

Proof:  By  definition,  (Sp,r,k  —  1)  |=  Ip  iff  (Sp,r,  k  —  1)  (=  Iap  for  G  =  A(r,  k  —  1). 
If  round  k  is  seemingly  clean  then  all  processors  active  at  time  k  receive  round  k 
messages  from  all  of  the  processors  in  G,  and  hence  the  view  of  each  active  processor 
at  time  k  has  a  copy  of  v(G,  r,  k  —  1),  and  it  follows  that  every  active  processor  at 
time  k  knows  p.  For  part  (b),  let  p  be  a  fact  about  the  initial  configuration  and 
let  ip  be  the  fact  “a  seemingly  clean  round  has  occured”.  Let  (r',^)  be  an  execution 
satisfying  (Sp,r',£)  )=  t p.  By  Theorem  6.5,  if  (Sp,r',£)  Ip  then  (Sp,r',k)  |=  Ip 
for  all  k  <  I .  Given  that  ip  holds  at  (r',  k),  let  round  k  of  r'  be  a  seemingly  clean 
round,  where  k  <  £.  Since  (S* ,  r' ,  k  —  1 )  )=  Ip,  by  part  (a)  we  have  that  (S*>,  r’,  k)  )= 
Ep.  Ep  is  stable  because  p  is,  and  therefore  (Sp,r,£)  f=  Ep.  By  Corollary  6.6  we 
have  that  (Sp,r',£)  (=  E(Ip).  We  have  just  shown  that  ip  D  (Ip  D  E(Ip))  is  valid 
in  S*.  Thus,  by  Proposition  6.7(b)  we  have  that  C(xp  D  (Ip  D  E(Ip)))  is  also  valid 
in  Sp.  Now  assume  that  r  is  a  run  satisfying  (r,  t)  \=  Ctp.  By  the  consequence  closure 
axiom  for  C  (Proposition  6.7(c)),  we  have  that  (Sp,r,£)  |=  C(Ip  D  E(Ip )).  And 
by  the  induction  axiom  we  have  that  (Sp,r,£)  }=  Ip  D  C(Ip ).  Since  C(Ip )  D  Cp 
is  valid,  we  also  have  that  (S*,r,^)  Ip  D  Cp.  Finally,  since  Cp  D  Ip  is  valid, 
we  have  that  (Sp,  r,  £)  \=  Ip  —  Cp,  and  we  are  done. 

As  a  corollary  of  Theorem  6.9  we  can  now  show: 

Corollary  6.10:  Let  p  be  a  fact  about  the  initial  configuration. 

a)  (Sp,r,t  +  1)  f=  Ip  iff  (Sp,r,t  -f  1)  (=  Cp. 

b)  (S*,r,n  -  1)  f=  Ip  iff  (S*,r,n  -  1)  ^  Cp. 

Proof:  Notice  that  the  “if”  direction  in  both  cases  is  immediate,  since  Cxp  D  lip 
is  valid  for  all  facts  ip.  We  now  show  the  other  direction.  All  runs  of  Sp  have 
the  property  that  no  more  than  t  processors  fail  during  the  run.  Given  that  a 
processor  falure  occurs  in  a  unique  round,  we  have  that  one  of  the  first  t  +  1  rounds 
of  every  run  of  Sp  must  be  clean.  Since  a  clean  round  is  in  particular  seemingly 
clean,  Proposition  6.7(b)  implies  that  at  time  t  +  1  it  is  common  knowledge  in  all 
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runs  of  Sp  that  a  seemingly  clean  round  has  occured.  Part  (a)  now  follows  from 
Theorem  6.9(b).  For  the  proof  of  part  (b),  we  need  a  slightly  stronger  valiant  of 
Theorem  6.9(b),  which  states  that  if  it  is  common  knowledge  that  there  has  either 
been  a  clean  round  or  that  there  is  at  most  one  processor  then  I<p  holds  iff  C<p  does. 
The  proof  of  this  fact  is  completely  analogous  to  that  of  Theorem  6.9(b),  given  that 
I<p  =  C<p  is  trivially  true  when  there  is  at  most  one  active  processor.  tx 

As  a  consequence  of  Theorem  6.9  and  Corollary  6.10  we  have  that  any  action 
that  depends  on  the  system’s  initial  configuration  can  be  carried  out  simultaneously 
in  a  consistent  way  by  the  set  of  active  processors  at  any  time  k  >  min{<  +  1,  n  —  1). 
This  is  consistent  with  the  fact  that  there  are  well-known  t-resilient  protocols  for 
SBA  that  attain  SBA  in  t  + 1  rounds.  Interestingly,  none  of  the  known  protocols  for 
SBA  attain  SBA  in  less  than  t  4- 1  rounds  in  any  run.  It  is  therefore  natural  to  ask 
whether  a  protocol  for  SBA  can  ever  attain  SBA  in  less  than  t  +  1  rounds.  Clearly, 
once  it  is  common  knowledge  that  a  clean  round  has  occurred,  SBA  can  be  attained. 
And  as  we  shall  see,  there  are  cases  in  which  the  existence  of  a  clean  round  becomes 
common  knowledge  before  time  t  + 1.  When  the  existence  of  a  clean  round  becomes 
common  knowledge  depends  crucially  on  the  pattern  of  failures,  and  on  the  time 
in  which  failures  become  implicitly  known  to  the  group  of  active  processors.  For 
example,  if  a  processor  p  detects  t  failures  in  the  first  round  of  a  run  of  V,  then 
the  second  round  of  the  run  will  be  clean,  and  at  the  end  of  the  second  round  all 
active  processors  will  know  that  p  detected  t  failures  in  round  1.  It  follows  from  the 
induction  axiom  and  Proposition  6.7(b)  that  at  the  end  of  round  2  it  will  be  common 
knowledge  that  all  processors  have  an  identical  view  of  the  initial  configuration 
(check!).  Clearly,  the  processors  can  then  perform  any  action  that  depends  on  the 
initial  configuration  (e.g.,  SBA)  in  a  consistent  way.  In  the  remainder  of  this  section 
we  show  a  class  of  runs  of  Sp  in  which  the  processors  attain  common  knowledge  of 
an  identical  view  of  the  initial  configuration  at  time  k,  for  every  k  between  2  and 
/  +  1.  In  the  next  section,  we  will  prove  that  this  is  in  fact  a  precise  classification 
of  the  runs  according  to  the  time  in  which  common  knowledge  of  an  identical  view 
of  the  initial  configuration  is  attained. 
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Intuitively,  if  there  are  more  than  k  failures  by  the  end  of  round  k,  then  from 
the  point  of  view  of  the  ability  to  delay  the  first  clean  round,  failures  have  been 
“wasted”.  In  particular,  if  for  some  k  it  is  the  case  that  there  are  k  +  j  failures  by 
the  end  of  round  k,  then  there  must  be  a  clean  round  before  time  t  +  1  —  j  (in  fact, 
between  round  k  +  1  and  round  t  +  l  —  j).  This  motivates  the  following  definitions: 
We  denote  the  number  of  processors  that  fail  by  time  k  in  r  by  N(r ,  k).  We  define 
the  difference  at  (r,  fc),  denoted  d(r,  fc),  by 

d(r,  k)  =f  JV(r,  fc)  -  fc. 

We  also  define  the  maximal  difference  in  (r,  i ),  denoted  D(r,  £),  by 

D(r,£ )  =f  ma xd(r,k). 

Observe  that  d(r,  0)  =  0  for  all  runs  r,  since  N(r,  0)  =  0.  Furthermore,  in  a  t- 
uniform  system  it  is  always  the  case  that  d(r,  k)  <  t  —  k,  since  N(r,  k )  <  t.  Let 
D  be  a  variable  whose  value  at  a  point  (r,  k)  is  D{r,  k),  and  let  d(k)  be  a  variable 
whose  value  at  any  point  in  r  is  d(r,  k).  By  Theorem  6.9(b)  we  have  that  if  at  time 
t  +  1  —  j  it  is  common  knowledge  that  D  >  j,  then  it  is  common  knowledge  that 
a  clean  round  has  occurred,  and  that  all  processors  have  an  identical  view  of  the 
initial  configuration.  We  are  about  to  show  that  the  protocol  V  guarantees  that  if 
it  ever  becomes  implicit  knowledge  that  D  >j  then  at  time  t  +  1  —  j  it  is  common 
knowledge  that  D  >  j  (and,  therfore,  that  a  clean  round  has  occurred).  This  leads 
us  to  the  following  definition:  Given  a  system  5,  the  wastefulness  of  (r,  f)  with 
respect  to  5,  denoted  W(S,  r,  £),  is  defined  by: 

W(5,  r, (,)  =f  max  {;'  :  (S,r,f)  (=  I(D  >  j)}. 

In  words,  the  wastefulness  of  (r,  £)  is  the  maximal  value  that  the  difference  d(r,  •)  is 
implicitly  known  to  have  assumed  by  time  t.  Finally,  we  define  the  wastefulness  of 
a  rim  r,  denoted  W(S,  r),  by: 

W(S,  r)  =f  max  W(S,  r,  i ). 

/>o 

We  now  formally  prove  the  claims  informally  stated  above.  We  start  with  a  some¬ 
what  technical  lemma  discussing  the  properties  of  wastefulness  in  the  case  of  S ■p : 
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Lemma  6.11:  Let  t  <  n  —  1. 

a)  If  (S*,r,  £)  J(D  >  j)  then  (Sp,  r,£)  I(d(k )  >  j )  for  some  k  <  £. 

b)  If  I(d(k)  >  j)  holds  at  time  k  then  at  time  k  +  1  either  E(d(k)  >  j)  holds  or 
I(d(k  +  1)  >  j )  does. 

c)  W(S*,r,  fc -I- 1)  >  W(S,»,r,  fc)  for  all  k  >  0. 

Proof:  For  part  (a),  let  r  6  satisfy  (Sp,r,£)  |=  I(D  >  j),  and  assume  that 
for  no  k  is  it  the  case  that  (Sp,  r,  £)  |=  I(d(k)  >  j )).  Let  r'  be  a  run  of  V  such 
that  (r(,  0)  =  (r,  0),  and  in  which  the  only  messages  not  delivered  are  those  that  are 
implicitly  known  at  (r,  £)  not  to  have  been  delivered.  It  is  easy  to  check  that  r'  £  Sp, 
since  no  more  than  t  processors  fail  in  r'.  Because  it  is  not  implicit  knowledge  at 
(r,£)  that  d(k)  >  j  for  any  k,  it  follows  that  D(r\£)  <  j.  If  we  show  that  the 
group  G  =  A(r,£)  has  exactly  the  same  view  in  (r,  £)  and  in  (r',£)  we  will  be  done, 
since  this  will  contradict  the  assumption  that  (S*,r,£)  f=  I(D  >  j).  We  now  prove 
that  A(r,  £)  has  the  same  view  in  (r,  £)  and  in  (r',£).  This  is  done  by  showing  by 
induction  on  k  that  the  set  of  processors  that  are  implicitly  known  at  (r,  £)  to  have 
been  active  at  time  k  <  £  have  the  same  views  at  time  k  in  both  r  and  r' .  Define 
G(£)  —  A(r,£).  For  k  <  £,  assume  inductively  that  G(k  -f  1)  is  defined,  and  for  all 
processors  p*  €  G(k  +  1)  let  g(pi,  k)  be  the  set  of  processors  from  which  p*  receives 
a  message  in  round  it  +  1  of  r.  Define 

G(k)  =l  |J  gfa,  k). 

Pi€G(k+l) 

Let  G'(£ )  =  G(£),  and  for  k  <  £  define  g\pi,k )  and  G'(k)  from  G'(k  +  1)  in  an 
analogous  fashion  (substituting  G ,  g,  and  r  by  G\  g',  and  r').  We  now  show 
by  induction  on  £  —  k  that  if  k  <  £  then  for  all  pi  €  G(k  +  1)  we  have  that 
g(pi,k)  =  g'(pi,k)  and  that  G(k)  =  G'(k).  Let  k  <  £  and  assume  inductively 
that  G(k  +  1)  =  G'(k  +  1).  (Notice  that  we  have  defined  G(£)  =  G'(£).)  Let 
Pi  €  G(k  +  1).  The  sets  G(k)  axe  the  sets  of  processors  implicitly  known  at  (r,  £) 
to  have  been  active  at  time  k.  The  sets  g(pi,  k  —  1)  are  the  sets  of  processors  that 
send  a  message  to  p<  in  round  k.  By  requiring  messages  to  contain  the  sender’s 
complete  view,  the  protocol  P  guarantees  that  a  processor  is  implicitly  known  at 
(r,  £)  to  have  been  active  at  time  k  iff  the  processor’s  view  at  (r,  k)  is  implicitly 
known.  Thus,  the  precise  identity  of  g(pi,  k)  for  p,  G  G(k  +  1)  is  implicitly  known 
at  (r,  £).  It  follows  that  processor  pj  sends  a  message  to  pi  in  round  k  +  1  of  r  iff 
Pj  sends  pi  a  round  k  +  1  message  in  r'.  It  thus  follows  that  g(pi,k)  =  g'(pi,k). 
Since  this  is  true  for  all  p,-  €  G(k  +  1),  we  have  that  G(k)  =  G'(k ),  and  the  claim 
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is  proven.  Notice  that  G(k )  D  G(k  4- 1).  We  now  show  by  induction  on  k  that  for 
all  pi  6  G(k)  it  is  the  case  that  u(p;,r,  k)  =  v(pi,r',k).  The  case  k  =  0  follows 
from  the  fact  that  (r,  0)  =  (r',0)  and  G(0)  =  G'(0).  Assume  inductively  the  claim 
holds  for  k ;  we  prove  it  for  k  +  1.  Let  p,-  6  G(k  -f  1).  Observe  that  p,’s  view  at 
(r,  k  +  1)  is  determined  by  its  view  at  (r,  k )  and  by  the  view  of  the  group  <jr(p,-,  k ) 
at  (r,  k).  Since  by  the  inductive  hypothesis  we  have  that  <?(pi,fc)  =  g'(pi,k),  and 
that  v(g(pi,k),r,k)  =  v(g'(pi,r' ,k),  and  that  u(pj,r,  k)  =  v(pi,r',k),  it  follows  that 
v(pi,r ,  k  +  1)  =  v(pi,r',k  +  1).  It  now  follows  that  v(G(i),r,£)  =  t>(G(£),r',£),  and 
we  are  done  with  part  (a). 

For  part  (b),  assume  that  (S*,r,  k)  [=  I(d(k )  >  j).  If  round  k  +  1  is  seemingly 
clean  then  E(d(k)  >  j)  holds  at  (r,  k  +  1).  Otherwise  there  must  be  (at  least  one) 
processor,  say  q,  that  fails  in  round  k  +  1  by  not  sending  a  message  to  at  least  one 
processor,  say  p,  that  is  active  at  time  k  +  1.  Thus,  in  particular,  p  knows  at  time 
k  +  1  that  q  has  failed.  Now,  by  requiring  all  processors  to  send  messages  to  all  of 
the  other  processors  in  every  round,  V  ensures  that  all  processors  that  fail  by  (r,  k) 
are  known  by  everyone  at  (r,  k  +  1)  to  have  failed.  It  follows  that  d(k  +  1)  >  j  is 
implicit  knowledge  at  that  time. 

For  part  (c),  assume  that  W(r,  k )  =  j.  Then  by  part  (a)  there  is  some  k*  <  k 
such  that  (S*>,r,  k)  I(d(k')  >  j).  Without  loss  of  generality  let  k'  be  the  largest 
such  number.  If  k'  <  k  then  by  (b)  we  have  that  at  time  k'  +  1  <  k  everyone  knows 
that  d(k')  >  j.  But  E(d(k')  >  j)  is  a  stable  fact  because  d(k')  >  j  is,  and  in  this 
case  W(r, k  +  1)  >  j,  and  the  claim  of  (c)  holds.  If  k'  =  k  then  part  (b)  implies 
that  at  time  k  +  1  either  everyone  will  know  that  d(k)  >  j  or  it  will  be  implicit 
knowledge  that  d(k  +  1)  >  j.  In  both  cases  we  will  have  W(r,  k  +  1)  >  jf,  and  we 
are  done.  txi 

We  now  have: 

Theorem  6.12:  Let  t  <  n  —  1. 

a)  W(S*,r)  >  j  iff  (5*,r,t  +  1  —  j)  \=  C(W(S*>,  “the  current  run”)  >  j). 

b)  Let  <p  be  a  fact  about  the  initial  configuration.  If  W(5*,r)  =  j  then 

(St,r,t  +  l-j)^=  I<p  iff  {S+,r,t  +  1  —  j)  (=  C<p. 
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Proof:  The  “if”  direction  of  part  (a)  is  immediate  from  the  fact  that  Cp  3  p  is 
valid.  We  now  show  the  other  direction.  Assume  that  W(Sp,  r)  >  j.  We  claim  that 
there  must  be  a  seemingly  clean  round  between  the  first  time  in  which  I(D  >  j)  first 
holds  and  time  t  +  1  —  j.  For  some  £  >  0  it  must  be  the  case  that  W(5*>,  r,  £)  >  j, 
and  hence  (Sp,r,£)  £=  I(D  >  j).  By  Lemma  6.11(a)  there  is  some  k  <  £  for  which 
(Sp,r,£)  (=  I(d(k)  >  j).  Let  k'  be  the  largest  such  k.  Since  d(k')  >  j  is  a  fact 
about  the  first  k'  rounds,  we  have  by  Theorem  6.5  that  (5*,r,  k')  (=  I(d(k')  >  j). 
Since  d(k')  >  j  implies  that  at  least  k'  4-  j  processors  must  have  failed  by  time  k', 
we  have  that  k'  <t—j.  Furthermore,  (Sp,r,  k'  4-  1)  I(d(k'  -f  1)  >  j)  implies  that 
no  new  processor  failure  becomes  visible  to  the  active  processors  in  round  k'  +  1, 
which  implies  that  round  k'  +  1  must  be  seemingly  clean.  Since  “d(k')  >  j"  is 
a  stable  fact,  it  follows  from  Theorem  6.9(a)  that  (Sp,r,k'  +  1)  (=  E(d(k')  >  7), 
and  hence  that  (Sp,r,£)  f=  E(d(k')  >  7)  for  all  £  >  k'  -J-  1.  In  particular,  since 
<4-1—7  >  k'  +  1,  we  have  that  (Sp,r, <  4-  1  —  j)  E(d(k')  >  j).  Let  ip  be 
the  fact  “W(S*>,  “the  current  run”)  >  j.  By  Corollary  6.6  we  have  that  E(d(k')  > 
j )  D  E(I(d(k')  >  7')),  and  since  (d(fc')  >  7)  D  ip  is  valid,  we  also  have  that 
(S*,  r,<  4-  1  -  7)  b  Eip.  It  follows  that  (S,»,r',<  4- 1  -  7)  |=  0  3  Erp  for  all  runs 
r'  G  Sp.  Given  that  <  <  n,  the  only  executions  that  are  similar  to  an  execution 
(r',  <  4- 1  -7)  are  of  the  form  (r",t  + 1  -7).  Thus,  by  Proposition  6.7(a)  we  have  that 
( Sp,r',t  4-  1  —7)  b  C(xp  D  Erp)  for  all  r'  €  Sp,  and  the  induction  axiom  implies 
that  all  executions  (r,<  4- 1  —  7)  satisfy  xp  3  Crp,  which  is  the  claim  of  part  (a).  For 
part  (b),  recall  from  the  proof  of  part  (a)  that  if  D  >  j  then  there  must  be  a  clean 
round  by  time  t  +  1  -  7.  By  part  (a),  if  W(S*,r)  =  j  then  at  time  <  4- 1  -  7  it  is 
common  knowledge  that  I(D  >  j)  and  therefore  in  particular  that  D  >  7  .  It  follows 
that  at  time  <4-1—7  it  is  common  knowledge  that  a  clean  round  (and  hence  also  a 
seemingly  clean  round)  has  occurred.  The  claim  now  follows  from  Theorem  6.9(b). 

IX 

Thus,  certain  patterns  of  failures  help  the  processors  to  reach  common  knowl¬ 
edge  of  an  identical  view  of  the  initial  configuration  early.  In  particular,  if  the 
wastefulness  of  the  run  is  7,  then  the  active  processors  obtain  common  knowledge 
of  a  common  view  of  the  initial  configuration  at  time  <  4- 1  —  7 .  We  now  make  precise 
our  heretofore  informal  claim  that  it  is  the  pattern  of  failures  that  determines  the 
wastefulness  of  the  runs  of  Sp.  Given  a  system  S,  a  fact  <p  is  said  to  be  about  the 
failure  pattern  (S,r,k)  f=  xp  iff  ( S,r',k ')  )=  p  for  all  runs  r,r'  G  S  that  have  the 
same  failure  pattern.  Observe  that  d(k)  and  D  Eire  facts  about  the  failure  pattern 
by  this  definition.  We  can  now  show: 
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Lemma  6.13:  Let  p  be  a  fact  about  the  failure  pattern.  Let  a  and  a'  be  initial 
configurations,  let  7r  be  a  failure  pattern,  and  let  r  =  and  r'  =  V{a\  7r). 

Then  (S?,r,£)  f=  Ip  iff  (S?,r',£)  f=  Ip,  for  all  £  >  0. 

Sketch  of  proof:  Assume  that  ( S*,r',k )  Ip>,  and  let  G  =  A(r!,k).  It  fol¬ 
lows  that  there  is  a  run  r"  such  that  v(G,r',k)  —  v(G,r" ,k),  and  (S?,r",k)  p. 
Let  Q  be  the  set  of  processors  on  whose  initial  states  a  and  o'  disagree.  Clearly 
v(G,r',k )  contains  the  view  at  time  0  (i.e.,  initial  state)  of  none  of  the  processors 
in  Q.  Thus,  without  loss  of  generality  r"  =  V(o' ,  7r")  for  some  it" .  An  induc¬ 
tive  argument  along  the  lines  of  the  proof  of  Lemma  6.11(a)  will  now  show  that 
v(G,r,k )  =  v(G,V(a,Tr"),k).  (Note  that  A(r,k)  =  A(r',k )  =  G).)  But  because  p 
is  a  fact  about  the  failure  pattern,  it  follows  that  (S*,'P(<7,7r"),  k)  p,  and  hence 
(S+,r,k)  Ip,  and  we  are  done  with  one  direction.  The  other  direction  of  the 
argument  is  symmetric.  cx 

We  can  now  define  the  wastefulness  of  a  failure  pattern  n,  denoted  w(ir),  to 
be  W(S*,r)  for  a  run  r  of  the  form  r  =  V(cr,  it)  for  some  a.  Lemma  6.13  implies 
that  w(ir)  is  independent  of  the  initial  configuration  a  chosen,  and  therefoie  w(v) 
is  well-defined.  Theorem  6.12  can  now  be  read  to  state  that  if  the  failure  pattern  of 
a  run  is  n,  then  at  time  t  +  1  —  to(7r)  the  active  processors  have  common  knowledge 
of  a  common  view  of  the  initial  configuration.  A  closer  inspection  of  the  proofs 
of  Theorem  6.5(c)  and  of  Theorem  6.12  actually  shows  that  if  w(n)  =  j  the  at 
time  t  +  l—j  there  is  a  particular  k'  such  that  the  active  processors  all  know  that 
d(k')  >  j,  and  for  no  £  >  k'  is  it  the  case  that  an  active  processor  knows  that 
d(£)  >  j.  By  Theorem  6.12(a),  u/(tt)  =  j  iff  “tv  =  jn  is  common  knowledge  at  time 
t  +  l—j.  It  follows  that  the  identity  of  this  number  k'  is  also  common  knowledge 
at  time  t  +  l—j.  Consequently,  the  active  processors  obtain  common  knowledge 
of  a  common  view  of  the  first  k'  rounds  of  the  run,  and  not  only  of  the  initial 
configuration.  Furthermore,  since  k1  is  determined  by  the  implicitly  known  values 
of  d(k),  Lemma  6.13  implies  that  the  value  of  k'  is  uniquely  determined  by  7r. 

One  of  the  consequences  of  Theorem  6.12  and  Lemma  6.13  is: 

Corollary  6.14:  There  is  a  t-resilient  protocol  for  SBA  that  reaches  SBA  in 
t  +  1  —  w(tt)  rounds  in  all  runs  of  the  protocol  in  which  the  failure  pattern  is  ir,  for 
all  failure  patterns  n  in  which  and  at  most  t  processors  fail. 

Proof:  The  protocol  (uniform  for  all  processors  pi )  is: 
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for  £  >  0  perform  the  following  at  time  £: 
if  K(D  >t  +  l-£) 

then  halt  ( and  send  no  messages  in  the  following  rounds); 
decide  0  if  K(“ some  initial  value  Vj  was  0”); 
decide  1  otherwise. 

else  send  the  current  view  to  all  processors  in  round  t  -f  1. 

The  K  in  the  text  of  the  protocol  means  “the  processor  knows”,  i.e.,  it  is  K, 
in  p^s  copy  of  the  protocol.  By  Theorem  6.12(a)  all  correct  processors  halt  after 
t  +  1  —  W(S*,r)  rounds.  By  Theorem  6.12(b)  the  active  processors  have  common 
knowledge  of  the  fact  that  they  have  an  identical  view  of  the  initial  configuration. 
Thus,  th'ur  decisions  are  identical.  The  decision  function  clearly  satisfies  the  re¬ 
quirements  of  SB  A.  txi 

The  above  protocol  is  not  a  protocol  in  the  traditional  sense  of  the  word,  but 
rather  a  knowledge-based  protocol,  to  use  the  terminology  of  Halpem  and  Fagin  in 
[3HF]:  a  processor’s  actions  at  any  given  point  are  determined  by  the  processor’s 
knowledge.  As  they  point  out,  not  every  knowledge-based  protocol  can  be  imple¬ 
mented.  However,  if  the  only  knowledge  required  in  the  protocol  is  knowledge  about 
the  past,  it  is  implementable.  Thus,  the  above  protocol  can  be  directly  translated 
into  a  standard  protocol. 

Notice  that  in  runs  in  which  many  failures  become  visible  early  it  is  the  case 
that  SB  A  is  attained  by  this  protocol  significantly  earlier  than  time  t  +  1.  We  are 
aware  of  no  other  protocol  for  SBA  that  stops  before  time  t  4- 1  in  some  cases.  In 
the  next  section  we  will  show  that  the  protocol  of  Corollary  6.14  is  optimal  in  the 
sense  that  for  any  given  pattern  of  failures,  it  attains  SBA  no  later  than  any  other 
protocol  for  SBA  does. 

Corollary  6.8  and  Theorem  6.12  imply  that  the  stopping  condition  K(D  >  t  + 
1  —  £)  implies  C(D  >  t  +  1  —  £).  In  fact,  we  will  be  able  to  show  that  this  protocol 
is  equivalent  to  the  following  protocol: 

for  £  >  0  perform  the  following  at  time  £: 
if  C(“some  initial  value  was  0”) 
then  decide  0  and  halt 
else  if  C(“some  initial  value  was  1”) 

then  decide  1  and  halt 

else  send  the  current  view  to  all  processors  in  round  £  +  1. 
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The  number  of  bits  of  information  required  to  describe  a  processor’s  view  at 
round  k  is  exponential  in  k.  Thus,  messages  in  the  above  protocols  might  be  too 
long  to  be  practical.  The  only  properties  that  we  really  needed  for  the  analysis 
were  that  the  protocol  require  all  processors  to  send  all  other  (active)  processors  a 
message  in  every  round,  and  that  every  processor  relay  all  the  information  it  has 
about  the  initial  configuration  and  about  the  patter  i  of  failures  in  the  message.  By 
modifying  the  protocol  slightly  so  that  messages  pt~ify  only  the  sender’s  view  of 
the  initial  configuration  and  of  the  failure  pattern,  we  get  '  protocol  for  SBA  with 
the  same  properties  in  which  the  length  of  each  message  is  0(n  +  t  log  n). 

6.4  Lower  bounds 

We  are  about  to  show  that  the  only  non-trivial  facts  that  can  become  common 
knowledge  in  a  run  r  of  a  t-uniform  system  S  before  time  t  +  1  —  W(S,  r)  are  facts 
about  the  wastefulness  of  the  run.  We  do  this  by  showing  that  all  executions  (r,  £) 
with  W(S,  r,  £)  <  t  —  i  are  similar.  We  first  prove  a  lemma  that  is  necessary  for  our 
proof  of  this  fact.  Roughly  speaking,  this  lemma  says  that  if  D(r,£)  <  t  —  £  and 
p  is  the  last  processor  to  fail  in  r,  then  (r,  £)  is  similar  to  an  execution  in  which  p 
doesn’t  fail,  and  all  other  processors  behave  as  they  do  in  r.  To  make  this  precise 
we  make  the  following  definition-  Given  a  failure  pattern  r,  the  failure  pattern  n~p 
is  defined  to  be  7r  —  (p,  k(p).  Q(p))  if  there  is  a  triple  of  the  foun  (p,  k(p ),  Q{p))  in  i r, 
and  to  be  ir  otherwise.  Given  a  run  r  =  V((7,  ),  we  define  r~p  to  be  V(cr,i r“p).  If 
V  does  not  require  all  processors  to  send  messages  to  all  other  processors  in  every 
round,  r  can  be  said  to  display  a  number  c.f  failure  patterns.  I.e.,  we  may  have 
V(<J ,  7r)  =  V((T,  7r')  for  7r  ^  7r'.  However,  it  is  easy  to  check  that  if  V{cr,  it)  =  V(cr,  n') 
then  ■p(<7,7r-p)  =  V(cr,  7r'-p),  so  that  r~p  is  well  defined.  We  can  now  show: 

Lemma  6.15:  Let  t  <  n  —  2,  and  let  5  be  a  t-uniform  system  for  V,  with  r  = 
V((T,ir)  €  S.  If  D(r,£ )  <  t  —  £  and  no  processor  fails  in  r  in  a  later  round  than  p 
does,  then  (r, f)  ~  (r-p,£). 

Proof:  If  p  does  not  fail  in  r  then  r  =  ”-  p,  and  the  claim  trivially  holds.  Thus, 
let  k  be  the  round  in  which  p  fails  in  r,  and  notice  that  by  assumption  no  processor 
fails  in  r  at  a  later  round.  If  k  >  £  then  (r,  £)  =  ( r~p,£ )  and  thus  clearly  (r, £)  ~ 
( r~p,£ ).  We  still  need  to  sturw  the  claim  for  k  <  £.  We  do  this  by  induction  on 

j  =  £  —  k. 

Case  j  =  0  (i.e.,  k  —  £):  Let  qi  7^  q,  €  A(r,£)  be  two  processors  active  at  (r,  £). 
Such  proceLsors  exist  by  the  assumption  that  t  <  n  —  2.  Clearly,  <j,’s  view  at 
(r,  £)  is  independent  of  whether  or  not  p  sent  a  message  to  q}  in  round  £.  Thus, 
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(r,£)  ~  (r',^),  where  ( r',£ )  differs  from  (r,£)  only  in  that  p  does  send  a  message  to 
qj  in  round  £  of  (r1  ,£).  (If  p  sends  qj  a  message  in  round  £  of  r,  then  r  =  r'.)  Now, 
since  p  does  send  qj  a  message  in  round  £  of  (r’,£),  processor  qj's  view  at  (r1  ,£) 
is  independent  of  whether  p  fails  in  ( r',£ )  (it  is  consistent  with  5/s  view  at  (r',£) 
that  p  sends  messages  to  all  processors  in  round  £),  and  thus  (r',£)  /V  (<"',«).  By 
transitivity  of  ~  we  also  have  that  (r,  £)  ~  (r~p,£). 

Case  j  >  0  (i.e.,  k  <  £):  Assume  inductively  that  the  claim  holds  for  j  —  1.  Again, 
let  Q  =  qa  }  be  the  set  of  processors  active  at  (r,  £)  to  whom  p  fails  to  send  a 

message  in  round  k  of  (r,  £).  We  prove  our  claim  by  induction  on  s.  If  s  =  0  then  no 
processor  active  in  (r,  £)  can  distinguish  whether  p  failed  in  round  k  or  in  round  k+ 1. 
Thus,  (r,  £)  ~  (r',£),  where  (r1  ,£)  differs  from  (r,£)  only  in  that  rather  than  failing 
in  round  k,  processor  p  fails  in  round  k  +  1  of  (r',^)  before  sending  any  messages. 
Since  £  —  (k  +  1)  =  j  —  1,  we  have  by  the  inductive  hypothesis  that  (r' ,  £)  (r"V). 
By  transitivity  of  ~  we  have  that  (r,  £)  ~  (r  p,^).  Now  assume  that  s  >  0  and  that 
the  claim  is  true  for  s  —  1.  Let  r„  be  a  run  such  that  (ra,  fc)  =  (r,  k),  processor  fails 
in  round  k  +  1  of  ra  before  sending  any  messages,  and  no  other  processor  fails  in  r, 
after  round  k.  Clearly  D(rty£)  <t  —  £,  since  d(ra ,  k1)  =  d(r,  k')  <t~£  for  allfc'  <  it, 
and  d(ra,  k  +  1)  =  N(ra,k  +  1)  -  (k  +  1)  =  N(r,  k)  +  1  -  (k  -f  1)  =  </(r,  k)  <t-£. 
Notice  also  that  no  processor  fails  in  (r,,£)  after  round  k  +  1.  Thus,  r  =  rj9‘ ,  and 
by  the  inductive  assumption  on  j  —  1,  we  have  that  (ra,£)  ~  (r,£).  Let  pt-  6  A(ra,^). 
Clearly  pj’s  view  at  (ra,£)  is  independent  of  whether  p  sent  a  message  to  qa  in  round 
k  of  (ra,£).  Thus,  (ra,£)  ~  (r',^),  where  r'  differs  from  ra  in  that  p  does  send  a 
message  to  qa  in  round  k  of  r'a.  Again  by  the  inductive  hypothesis  for  j  —  1  we  have 
that  (r'a,£)  ~  (r',£),  where  r'  =  Processor  p  fails  to  send  round  k  messages 

only  to  s  —  1  processors  in  r',  and  thus  by  the  inductive  hypothesis  for  s  —  1  we 
have  that  ( r' ,£ )  ~  (r~p,£).  By  the  symmetry  and  transitivity  of  ~,  we  have  that 
( r,£ )  ~  (r-p,^),  and  we  are  done.  tx 

The  proof  of  Lemma  6.15  is  a  generalization  and  simplification  of  the  basic 
inductive  argument  in  the  lower  bound  proofs  of  [DS],  [LF],  and  [CD].  Notice  that 
the  run  r~p  in  the  statement  of  Lemma  6.15  has  the  following  properties:  (i)  if  r  is 
not  free  of  failures,  then  the  number  of  processors  that  fail  in  r~p  is  one  fewer  than 
in  r;  (ii)  D(r~p,£ )  <  t  —  £,  and  (iii)  (r-p,0)  =  (r,0).  We  can  now  use  Lemma  6.15 
to  show: 
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Theorem  6.16:  Let  t  <  n  —  2  and  let  S  be  an  independent  t-uniform  system. 

a)  If  £  <  t  then  all  failure-free  executions  (r,  £)  £  S  x  {^}  are  similar. 

b)  If  W(S,r,£)  <  t  —  £  and  W(S,  r',£)  <  t  —  £,  then  (r,  £)  ~  (r\£). 

Proof:  (a)  Assume  that  £  <  t  and  let  (r,  f)  and  (r,  £)  be  failure-free  executions. 
We  wish  to  show  that  (r,f)  ~  (r,£).  Let  Q  =  be  the  set  of  processors 

whose  initial  states  in  r  and  r  differ.  We  prove  by  induction  on  s  that  (r,  £)  (M). 

If  s  =  0  then  (r,  £)  =  (r,£)  and  we  are  done.  Let  s  >  0  and  assume  inductively  that 
all  failure-free  executions  that  differ  from  (r,£)  in  the  initial  state  of  no  more  than 
s  —  1  processors  are  similar  to  it.  Let  (r, ,  £)  be  an  execution  such  that  (r,  0)  =  (r* ,  0), 
in  which  qa  fails  in  the  first  round  without  sending  any  messages,  and  no  other 
processor  fails.  Clearly  D(ra,£ )  —  0  <  t  —  £,  and  by  Lemma  6.15  we  have  that 
( ra,£ )  ~  ( r,£ ).  Let  pi  €  A(ra,£).  Given  that  5  is  an  independent  ^-uniform  system, 
processor  p,-’s  view  at  (ra,£)  does  not  determine  whether  the  initial  state  of  qa  in  ra 
is  as  in  r  or  as  in  r.  Thus,  ( ra,£ )  £),  where  r'  differs  from  ra  only  in  that  the 

initial  state  of  qa  in  r'a  is  as  in  r.  Again  by  Lemma  6.15  we  have  that  (r't,£)  ~  (r',^), 
where  (r'  ,0)  =  (r',0),  and  (r',£)  is  failure-free.  Since  (r1  ,£)  differs  from  (f,£)  only 
in  the  initial  states  of  s  —  1  processors,  by  the  inductive  assumption  we  have  that 
( r',£ )  ~  (r,£),  and  by  the  symmetry  and  transitivity  of  ~  we  have  ( r,£ )  ~  ( r,£ ), 
and  we  are  done  with  part  (a). 

(b)  If  W(5,  r,  £)  <  t  —  £  then  in  particular  it  is  not  implicit  knowledge  at  (r,  £)  that 
d(k)  >  t  —  £  for  some  k  <  £.  It  follows  that  (r,  £)  ~  (r,  £),  for  some  r  £  S  satisfying 
D(f,£)  <  t  —  £.  Using  Lemma  6.15,  a  straightforward  induction  on  the  number  of 
processors  that  fail  in  ( r,£ )  shows  that  (r,£)  ~  (f,  £),  where  (f,f)  is  failure-free.  By 
transitivity  of  ~  we  have  that  (r,£)  ~  (£,£)■  The  same  argument  applies  to  (r',£), 
and  the  claim  now  follows  from  part  (a).  tx 

Observe  that  the  assumption  of  independence  of  the  set  of  initial  configurations 
is  essential  to  this  lower  bound.  Lemma  6.15  can  also  be  used  to  characterize  non- 
independent  systems.  Lemma  6.15  and  Theorem  6.16(a)  generalize  and  somewhat 
simplify  the  t  + 1  round  lower  bound  on  the  worst-case  behavior  of  SB  A  in  our  model 
(see  [DLM],  [DS],  [FL],  [H],  [CD]).  As  we  will  see  in  the  sequel,  Theorem  6.16(b) 
allows  us  to  completely  characterize  the  runs  in  which  t  +  1  rounds  are  necessary 
for  attaining  SBA,  as  well  as  those  that  require  k  rounds,  for  all  k.  More  generally, 
Proposition  6.7(a)  and  Theorem  6.16(b)  provide  us  with  a  lower  bound  on  the  time 
by  which  facts  can  become  common  knowledge  in  t-uniform  systems.  Formally,  we 
have: 
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Theorem  6.17:  Let  t  <  n  —  2,  and  let  5  be  an  independent  t-uniform  system.  If 
(S,r',£)  ^  <p  holds  for  some  r'  €  5  satisfying  W(S,r')  <  t  —  £ ,  then  (S,  r,£)  ^  CV 
for  all  r  6  5  satisfying  W(5, r)  <i  —  £.  tx 

Theorem  6.17  and  Theorem  6.12(b)  completely  characterize  when  non-trivial 
facts  about  the  initial  configuration  become  common  knowledge  in  the  runs  of  S p. 
In  a  precise  sense,  they  imply  that  the  only  fact  that  is  common  knowledge  at  (r,  k), 
for  k  <  t  —  W(S*,r),  is  that  the  wastefulness  is  less  than  t  +  1  —  k.  Formally,  we 
have: 

Corollary  6.18:  Let  t  <  n  —  2,  let  S-p  be  an  independent  t- uniform  system  for 
V,  and  let  W(S*,r)  <  t  —  £.  Then  ( S?,r,£ )  \=  C<p  iff  for  all  r'  6  S?  such  that 
W(5^,r',£)  <  t  —  £  it  is  the  case  that  (S^,r',£)  \=  <p.  txi 

Furthermore,  Corollary  6.8  and  Theorem  6.17  immediately  imply: 

Corollary  6.19:  Let  t  <  n  —  2,  let  V  be  a  t-resilient  protocol  for  SBA,  and  let  S 
be  a  ^-uniform  system  for  V,  with  r  G  S.  Then  SBA  is  not  attained  in  r  in  fewer 
than  t  +  1  —  W(S,  r)  rounds.  tx 

Corollary  6.19  proves  that  SBA  cannot  be  attained  in  the  runs  of  V  any  earlier 
than  it  is  attained  by  the  protocol  of  Corollary  6.14.  However,  it  still  seems  possible 
that  using  another  protocol  SBA  will  be  attainable  in  fewer  rounds  than  in  the  pro¬ 
tocol  of  Corollary  6.14.  We  now  show  that  this  protocol  is  optimal  in  a  rather  strong 
sense:  for  any  given  initial  configuration  and  failure  pattern,  no  protocol  attains 
SBA  in  fewer  rounds  than  the  protocol  of  Corollary  6.14.  This  fact  follows  from 
the  following  theorem,  which  states  that  the  wastefulness  of  a  ran  resulting  from  a 
given  initial  configuration  and  failure  pattern  is  no  greater  than  its  wastefulness  in 
S-p.  Given  Corollary  6.19,  this  will  imply  that  the  protocol  of  Corollary  6.14  always 
attains  SBA  at  the  earliest  possible  time,  given  the  initial  configuration  and  failure 
pattern. 
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Theorem  6.20:  Let  5  be  a  t-uniform  system  for  a  protocol  "P,  let  r  =  P(<7,7r), 
and  let  r  =  P(<x,  7r).  Then  W(5, r)  <  VV(SV,r). 

Proof:  We  will  show  a  more  general  fact  from  which  the  theorem  will  follow. 
Given  an  initial  configuration  a' ,  and  a  failure  pattern  ir' ,  let  r'  =  P(o-',7r')  and 
f'  =  P(cr',i r').  Notice  that  A(r,  fc')  =  A(r,k')  for  all  fc'.  We  claim  that  for  all  k 
and  all  pi  €  A(r,  k)  it  is  the  case  that  if  v(p,,r,  fc )  =  u(pt,  r',fc)  then  t/(p,,  r,  fc )  = 
v(pi,r' ,  fc).  We  argue  by  induction  on  k.  The  case  fc  =  0  is  immediate.  Let  k  >  0 
and  assume  inductively  *hat  *he  claim  holds  for  all  processor:;  in  „i(r,  k  -  1)  at  time 
k  —  1.  Thus,  if  i>(p;,f,fc)  =  v(pi,r',k)  and  pj  sends  a  round  k  message  to  pi  in  f, 
then  pj  has  the  same  view  at  (f ,  k  —  1)  and  (f ' ,  k  —  1),  and  pj  also  sends  p,  a  round  k 
message  in  r'.  In  this  case  both  7r  and  ir'  determine  that  round  k  messages  from 
Pj  to  pi  are  delivered.  By  the  inductive  assumption  pj  also  has  the  same  view  in 
(r,  fc  —  1)  and  in  (r',  k  —  1).  It  follows  that  V  requires  pj  to  act  identically  in  round  k 
of  both  r  and  r\  And  if  pj  is  required  to  send  pi  a  round  k  message  in  r  then  it  is 
required  to  send  pi  the  same  message  in  round  k  of  r' .  Processor  pj  does  not  send 
a  round  k  message  to  p,  in  r  only  if  ir  determines  that  pj  cannot  send  pi  such  a 
message.  But  then  for  similar  reasons  ir‘  must  also  determine  that  pj  does  not  send 
Pi  a  round  k  message.  It  follows  that  in  this  case  p}  does  not  send  p,-  a  round  k 
message  in  r  or  in  r\  Thus,  for  all  processors  pj  it  is  the  case  that  p,  receives  a 
round  k  message  from  pj  in  r  iff  pi  receives  an  identical  message  from  p}  in  round  k 
of  r'.  The  inductive  assumption  also  implies  that  v(pi,r,k  —  1)  =  u(p,,  r',  k  —  1), 
and  it  now  follows  that  u(pi,r,  k )  =  u(pi,r',  k )  and  we  are  done  with  the  claim.  We 
now  show  how  the  theorem  f-il,  vs  from  this  claim.  Assume  that  W(5,  r)  =  j  and 
that  W(S*,f)  <  j.  Then  the  '  a  time  k  such  that  (S,r,  k)  |=  I(D  >  jf),  and 
( S-p,r,k )  ^  I(D  >  j).  Let  G  =  A(f,fc)  (notice  that  G  =  A(r,k)  as  well).  It  follows 
that  there  is  a  run  f'  €  S ■p  such  that  v(G,r,k)  =  u(G,f',fc)  and  D(f' ,k)  <  j.  Let 
a'  and  7r'  be  the  initial  configuration  and  failure  pattern  in  r'.  Let  r'  be  P(<r',7r'). 
Since  v(G,r,k )  =  u(G, r',fc),  our  claim  implies  that  v(G,r,  k)  =  v(G,r',k).  But 
since  D(r',k )  =  D(r',fc)  <  j  and  A(r,  k)  =  G,  we  have  that  (5,  r,  fc)  ^  I(D  >  j), 
contradicting  our  original  assumption.  xi 

Theorem  6.20  and  Corollary  6.19  now  imply  that  the  protocol  of  Corollary  6.14  is 
indeed  optimal  in  the  strong  sense  we  intended:  given  any  initial  configuration  and 
failure  pattern,  it  attains  SBA  as  early  as  any  t-resilient  protocol  for  SBA  can.  In 
light  of  Theorem  6.20,  we  can  talk  about  the  inherent  wastefulness  w(7r)  of  a  failure 
pattern  7r,  defined  to  be  W(S*,P(cr,7r)).  That  w(n)  is  well  defined  follows  from 
the  fact  that  runs  r  of  Sp  have  the  property  that  W(S*,r,  fc)  depends  only  on  the 
pattern  of  failures  and  is  independent  of  the  initial  configuration.  This  can  be  proved 
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by  a  somewhat  tedious  but  straightforward  induction  on  k ,  and  is  left  to  the  reader. 
Theorem  6.16  through  Corollary  6.19  can  now  be  viewed  as  statements  about  the 
effect  of  the  failure  pattern  on  the  similarity  of  executions  and  on  what  facts  can 
become  common  knowledge  at  various  times  in  the  execution  of  an  arbitrary  t- 
resilient  protocol.  Corollaries  14  and  19  tell  us  that  exactly  t  +  1  —  ty(7r)  rounds  are 
necessary  and  sufficient  to  attain  SBA  in  runs  of  any  t-resilient  protocol  for  SBA 
that  have  pattern  failure  n r  (in  the  rest  of  the  chapter  we  will  use  -k  to  refer  to  the 
failure  pattern  of  the  run  in  question).  This  provides  a  complete  characterization 
of  the  number  of  rounds  required  to  reach  SBA  in  a  run,  given  the  pattern  in  which 
failures  occur. 

We  have  seen  that  the  only  facts  that  can  become  common  knowledge  before 
time  t  + 1  —  w(n)  are  facts  about  the  wastefulness  of  the  run.  In  the  previous  section 
we  saw  that  in  runs  of  S?  the  processors  attain  common  knowledge  of  an  identical 
view  of  the  initial  configuration  at  time  t  +  1  —  w(n).  Thus,  we  have  a  complete 
description  of  when  facts  about  the  initial  configuration  become  common  knowledge. 
It  is  interesting  to  ask  the  more  general  question  of  when  arbitrary  facts  become 
common  knowledge.  As  we  have  remarked  in  the  previous  section,  the  proofs  of 
Lemma  6.11  and  Theorem  6.12  can  be  used  to  show  that  at  time  t  -f  1  —  w(k)  in  a 
run  of  Sf,  the  active  processors  do  not  attain  common  knowledge  only  of  the  fact  that 
they  have  a  identical  view  of  initial  configuration,  Rather,  there  is  a  natural  number 
k  >  0  such  that  at  time  t  +  1  —  w(ir)  they  attain  common  knowledge  of  an  identical 
view  of  the  state  of  the  system  at  time  k.  We  denote  this  number  k  by  A;1(ir).  There 
is  some  number,  say  /, ,  of  processors  that  are  commonly  known  at  time  t  + 1  —  w(ir) 
to  have  failed  by  time  k^n).  Let  tl  =t  —  fx.  Roughly  speaking,  time  r)  +  1  can 
now  be  regarded  as  the  start  of  a  new  run,  and  for  appropriate  definitions  of  d,(fc) 
and  u>i(7r),  we  get  that  at  time  (k^n)  +  1)  + 1,  +  1  —  w^n)  the  system  will  attain 
common  knowledge  of  a  common  view  of  the  state  of  the  system  at  time  kt( tt)  4- 1. 
Interestingly,  it  can  be  shown  that  (kx(n)  + 1)  +  + 1  —  =  t  +  2  —  w(v).  That 

is,  one  round  after  the  processors  attain  common  knowledge  of  (a  common  view 
of)  the  state  of  the  run  at  time  k^ir),  they  attain  common  knowledge  of  a  common 
view  of  the  state  of  the  run  at  kl(v)  + 1.  In  fact,  again  we  have  some  number  k"  >  0 
such  that  the  processors  have  common  knowledge  at  time  t  +  2  —  w(n)  of  a  common 
view  of  the  state  of  the  system  at  time  k".  Denoting  this  number  by  fc3,  the  above 
analysis  can  be  repeated.  We  leave  further  details  to  the  interested  reader. 

The  result  of  the  analysis  discussed  in  the  preceding  paragraph  is  that  at  any 
point  after  time  t+w(n)  in  a  run  of  V  the  active  processors  have  common  knowledge 
of  a  common  view  of  the  first  k  rounds,  for  a  number  k  that  can  be  computed  given 
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the  failure  pattern  7r.  Following  every  round  after  time  t  +  1  —  w(tt)  the  active 
processors  attain  common  knowledge  of  a  common  view  of  at  least  one  additional 
round.  Consequently,  there  is  a  window  of  common  plausibility  of  a  number  of 
the  most  recent  rounds  about  which  no  non-trivia!  facts  axe  common  knowledge, 
and  a  common  view  of  all  preceding  rounds  is  common  knowledge.  The  size  of 
this  window  at  a  given  point  is  t  minus  the  number  of  processors  that  (at  that 
point)  are  not  commonly  known  to  have  failed.  This  classification  of  what  facts 
are  common  knowledge  in  the  runs  of  SV  provide  good  upper  bounds  on  when  a 
simultaneous  action  that  depends  on  the  first  k  rounds  can  then  be  carried  o^t  bv 
all  active  processors  in  a  consistent  way.  The  lower  bound  results  of  this  section 
can  imply  that  these  bounds  are  tight  in  all  runs ,  and  thus  we  have  a  complete 
characterization  of  when  simultaneous  actions  that  depend  on  the  first  k  rounds 
can  be  carried  out,  as  a  function  of  the  failure  pattern. 

6.5  Applications 

Throughout  the  chapter  we  have  shown  how  our  results  regarding  when  common 
knowledge  of  various  facts  is  attained  in  a  Byzantine  system  affect  the  SBA  problem. 
In  this  section  we  discuss  some  further  consequences  of  the  analysis  presented  in  the 
previous  sections.  This  is  intended  to  illustrate  the  types  of  applications  that  the 
analysis  can  be  used  for.  We  start  by  considering  some  problems  that  are  closely 
related  to  SBA. 

The  problem  of  Weak  SBA,  which  differs  from  SBA  in  that  clause  (4)  is  changed 
so  that  the  active  processors  are  required  to  decide  on  a  value  v  only  if  all  initial 
values  were  v  and  no  processor  fails,  was  introduced  by  Lamport  as  a  weakening  of 
SBA.  However,  Theorem  6.16(b)  immediately  implies  that  the  active  processois  do 
not  have  common  knowledge  of  whether  any  processors  failed  before  time  t+1— 
in  any  run  of  a  t-resilient  protocol  for  WSBA  with  failure  pattern  tt.  And  since  SBA 
can  already  be  performed  at  time  t  +  1  —  w(n),  we  have  that  t-resilient  protocols 
cannot  attain  WSBA  any  earlier  than  they  can  SBA.  Theorem  6.16  also  describes 
why  the  variant  of  SBA  used  in  this  chapter  (which  was  introduced  by  [FL])  is 
essentially  equivalent  to  the  original  version  of  the  Byzantine  Generals  problem  of 
[PSL],  in  which  only  one  processor  initially  has  a  value,  and  the  processors  need 
to  decide  on  this  value  if  the  processor  does  not  fail,  and  on  a  consistent  value 
otherwise. 

It  has  been  a  folk  conjecture  that  a  t-resilient  protocol  that  guarantees  that  a 
non-trivial  action  is  performed  simultaneously  must  require  t  +  1  rounds  in  the  worst 
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case.  We  now  show  that  this  is  not  the  case.  Let  bivalent  agreement  be  defined  by 
clauses  (l)-(3)  of  SB  A,  and  replacing  clause  (4)  by: 

4'.  At  least  one  run  of  the  protocol  decides  0,  and  at  least  one  run  decides  1. 

Thus,  a  t-resilient  protocol  for  bivalent  agreement  is  a  protocol  V  with  the  property 
that  all  runs  of  the  independent  t-uniform  system  5  for  V  in  which  the  set  of  initial 
configurations  is  {0,1}”  satisfy  clauses  (l)-(3),  and  at  least  one  run  of  5  decides 
0,  and  at  least  one  run  decides  1.  Proposition  6.7  implies  that  any  action  that  is 
guaranteed  to  be  performed  simultaneously  requires  some  fact  to  become  common 
knowledge  before  the  action  can  be  performed.  Theorem  6.12(b)  implies  that  at 
the  end  of  round  2  of  S -p  it  is  common  knowledge  whether  or  not  the  wastefulness 
of  the  run  is  t  —  1  (i.e.,  whether  t  processors  were  seen  to  have  failed  in  the  first 
round).  Thus,  we  can  easily  derive  a  t-resilient  protocol  for  bivalent  agreement: 
Each  processor  follows  V  for  the  first  two  rounds,  and  then  decides  0  if  it  knows 
that  t  processors  failed  in  the  first  round,  and  1  otherwise.  This  protocol  attains 
bivalent  agreement  in  two  rounds,  and  Theorem  6.17  implies  that  there  is  no  faster 
protocol  for  bivalent  agreement  so  long  as  t  <  n— 2.  Furthermore,  it  implies  that  in  a 
precise  sense  this  is  the  only  two-round  protocol  for  bivalent  agreement.  We  leave  it 
to  the  reader  to  check  that  if  t  >  n  —  1  then  there  is  a  protocol  for  bivalent  agreement 
that  requires  only  one  round.  Thus,  bivalent  agreement  is  a  truly  easier  problem 
than  SBA.  We  note  that  [FLP]  and  [DDS]  prove  that  in  an  asynchronous  system 
there  is  no  1-resilient  protocol  for  an  even  weaker  variant  of  bivalent  agreement. 

We  have  stressed  the  connection  between  common  knowledge  and  simultaneous 
actions.  Interestingly,  the  lower  bounds  on  the  time  required  for  attaining  common 
knowledge  imply  worst-case  bounds  on  the  behavior  of  t-resilient  protocols  that 
perform  coordinated  actions  that  axe  not  required  to  be  performed  simultaneously. 
For  example,  Eventual  Byzantine  Agreement  (EBA)  is  defined  by  clauses  (1),  (2), 
and  (4)  of  SBA:  the  processors’  decisions  need  not  be  simultaneous  (cf.  [DRS]). 
There  are  well-known  protocols  that  attain  EBA  after  two  rounds  in  failure-free 
runs  (for  which  w(n)  =  0).  However,  using  Proposition  6.7  and  Theorems  6.17 
and  6.20  it  is  not  hard  to  show  that  a  t-resilient  protocol  for  EBA  must  require  t  + 1 
rounds  in  some  runs  with  w(n )  =  0.  More  generally,  these  theorems  show  that  such 
a  protocol  must  require  t  + 1  —  j  rounds  in  some  runs  with  u>(7r)  =  j.  This  is  a  slight 
refinement  of  the  well-known  fact  that  EBA  requires  t  +  1  rounds  in  the  worst  case 
(cf.  [DRS]).  Many  very  relevant  and  interesting  aspects  of  EBA  are  not  covered  by 
our  analysis.  We  believe  that  an  analysis  of  EBA  should  involve  a  study  of  when 
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the  states  of  e-common  knowledge  and  eventual  common  knowledge  (cf.  Chapter  2) 
are  attained  in  a  Byzantine  environment.  This  is  an  interesting  open  problem. 

As  our  investigation  centered  around  t-resilient  protocols,  we  now  briefly  discuss 
some  other  possible  reliability  assumptions.  Recall  that  Corollary  6.10  states  that 
all  active  processors  are  guaranteed  to  have  an  identical  view  of  the  system’s  initial 
configuration  at  time  t  +  1  in  every  run  of  a  t-uniform  system  for  V.  This  follows 
simply  from  the  fact  that  at  time  t  H-  1  it  is  common  knowledge  that  one  of  the 
previous  rounds  was  clean.  Instead  of  t- resiliency,  we  could  require  that  a  protocol 
for  SBA  be  guaranteed  to  attain  SBA  so  long  as  no  more  than  k  consecutive  rounds 
are  dirty.  In  the  system  corresponding  to  all  the  runs  of  V  in  which  at  most  k 
consecutive  rounds  are  dirty,  it  is  common  knowledge  at  time  k  +  1  that  a  clean 
round  has  occurred,  and  V  can  be  converted  in  to  a  protocol  for  SBA  that  is 
guaranteed  to  attain  SBA  in  no  more  than  k  +  1  rounds.  This  means,  for  example, 
that  if  processors  in  a  Byzantine  system  are  known  to  fail  at  least  two  at  a  time,  SBA 
can  be  achieved  in  t/2  +  1  rounds.  Having  a  bound  of  k  consecutive  dirty  rounds 
seems  in  many  cases  to  be  a  more  appropriate  assumption  about  a  system  than 
having  a  bound  of  t  on  the  total  number  of  failures  possible,  since  the  latter  is  not 
a  local  assumption.  Of  course,  these  two  assumptions  are  not  mutually  exclusive, 
and  we  may  often  have  a  small  bound  on  the  possible  number  of  consecutive  dirty 
rounds,  and  only  a  much  larger  bound  holds  for  the  total  number  of  failures.  The 
bound  on  the  number  of  consecutive  dirty  rounds  implies  a  good  upper  bound  on 
SBA  in  the  case  of  crash  failures. 

Another  way  we  can  consider  varying  the  reliability  assumptions  about  the  sys¬ 
tem  is  by  restricting  the  number  of  possible  processor  failures  that  can  occur  in 
a  round.  For  example,  let  us  consider  the  assumption  that  at  most  one  processor 
can  fail  in  any  given  round  of  the  computation,  and  at  most  t  processors  might  fail 
overall.  We  are  interested  in  the  question  of  whether  such  assumptions  allow  us  to 
attain  SBA  quickly.  Unfortunately,  the  lower  bound  proofs  of  Lemma  6.15  and  The¬ 
orem  6.16  work  very  well  for  this  reliability  model.  In  fact,  since  all  of  the  runs  of 
such  a  system  are  guaranteed  to  have  wastefulness  0,  even  bivalent  agreement  can¬ 
not  be  attained  in  any  run  of  the  system  in  less  than  t  +  1  rounds!  SBA  and  WSBA 
clearly  require  t  +  1  rounds  in  all  runs  of  the  system.  We  now  present  a  somewhat 
artificial  variant  of  this  assumption  that  provides  us  with  a  non-uniform  reliability 
assumption  whose  behavior  is  interesting  and  somewhat  counter-intuitive:  We  say 
that  a  protocol  for  SBA  is  one  visible  failure  resistant  (1-VFR)  if  it  is  guaranteed 
to  attain  SBA  so  long  as  no  more  than  one  processor  failure  becomes  visible  to  the 
active  processors  in  any  given  round.  The  set  of  possible  runs  of  a  protocol  V  that 


SECT.  6.6 


DISCUSSION  93 


display  such  behavior  will  be  called  a  visibly  restrained  system  for  V.  It  is  possible 
to  show  that  in  the  visibly  restrained  system  for  the  simple  protocol  V  of  Section  6.3 
it  is  common  knowledge  at  time  2  whether  round  1  is  clean,  and  therefore  WSBA 
can  be  attained  in  two  rounds.  However,  SBA  can  be  shown  to  require  n  —  1  rounds 
in  runs  of  V  in  which  one  processor  fails  in  every  round  except  possibly  the  (n  —  l)st 
round.  (If  one  adds  a  bound  of  t  <  n  —  2  on  the  total  number  of  failures  possible, 
n  —  1  is  replaced  by  t  +  1.)  Interestingly,  there  is  a  1-VFR  protocol  for  SBA  that 
is  guaranteed  to  attain  SBA  in  three  rounds  (in  all  runs)!  Thus,  for  the  1-VFR 
reliability  model,  our  simple  protocol  is  no  longer  a  most  general  protocol.  The 
reason  for  the  odd  behavior  of  1-VFR  protocols  is  that  the  patterns  of  failures  of 
the  runs  that  satisfy  1-VFR  are  intimately  related  to  the  structure  of  the  protocol. 
Thus,  the  protocol  can  restrict  the  patterns  of  failures  possible  and  make  effective 
use  of  the  1-VFR  assumption. 

6.6  Discussion 

We  have  analtzed  th  states  of  knowledge  attainable  in  the  course  course  of  the 
execution  of  various  protocols  in  th  system,  for  the  case  of  a  particular  simple  model 
of  unreliable  distributed  systems  that  is  fairly  popular  in  the  literature.  Motivated 
by  Chapters  1  and  2,  the  analysis  focused  mainly  on  when  various  facts  about  the 
oy atom  become  common  knowledge  given  an  upper  bound  of  t  on  the  number  of  pos¬ 
sible  faulty  processors.  This  problem  is  shown  to  directly  correspond  to  the  question 
of  when  simultaneous  actions  of  various  types  can  be  performed  by  the  processors 
in  such  a  system.  In  particular,  this  is  a  generalization  of  Simultaneous  Byzantine 
Agreement  and  related  problems.  By  deriving  exact  bounds  on  the  question  of 
when  facts  become  common  knowledge,  we  immediately  got  exact  bounds  for  SBA 
and  many  other  problems.  An  interesting  fact  that  came  out  of  the  analysis  was 
that  the  pattern  in  which  processors  fail  in  a  given  run  determines  a  lower  bound 
on  the  time  in  which  facts  about  the  system’s  initial  configuration  become  common 
knowledge,  with  different  patterns  determining  different  bounds.  Ironically,  facts 
become  common  knowledge  faster  in  cases  when  many  processors  fail  early  in  the 
run.  The  somewhat  paradoxical  argument  for  this  is  that,  given  an  upper  bound 
on  the  total  number  of  failures  possible,  if  many  processors  fail  early  then  only  few 
can  fail  later.  The  protocol  can  make  effective  use  of  the  fact  that  the  rest  of  the 
run  is  relatively  free  of  failures.  As  a  by-product  of  the  analysis,  we  were  able  to 
derive  a  simple  improved  protocol  for  SBA  that  is  optimal  in  all  runs. 

Our  analysis  shows  that  the  essential  driving  force  behind  many  of  the  phenom¬ 
ena  in  unreliable  systems  seems  to  be  the  inherent  uncertainty  that  a  particular  site 


94  SYSTEMS  OF  UNRELIABLE  PROCESSORS 


CHAP.  6 


I 

in  such  a  system  has  about  the  global  state  of  the  system.  We  come  to  grips  with 
this  uncertainty  by  performing  a  knowledge- based  analysis  of  such  a  system.  We 
stress  that  our  analysis  was  by  and  large  restricted  to  protocols  for  simultaneous  ac¬ 
tions  in  a  rather  clean  and  simple  model  of  unreliable  systems:  synchronous  systems 
with  global  clocks  and  crash  failures.  We  believe  that  performing  similar  analyses 
for  nastier  models  of  failures  will  prove  very  exciting,  and  will  provide  a  much  better 
understanding  of  the  true  structure  underlying  the  richer  failure  models,  and  of  the 
differences  between  the  failure  models.  The  ideas  and  techniques  developed  in  this 
chapter  should  provide  a  sound  basis  on  which  to  build  such  an  analysis,  although 
it  is  clear  that  a  number  of  additional  ideas  would  be  required. 

In  summary,  the  analysis  in  this  chapter  makes  explicit  and  essential  use  of 
reasoning  about  knowledge  in  order  to  obtain  insight  into  a  well-known  problem 
in  distributed  systems.  The  generality  and  applicability  of  our  results  suggest  that 
this  is  a  promising  approach. 


Chapter  7 


Conclusions 


Knowledge  seems  to  play  an  important  role  in  distributed  computing.  A  ma¬ 
jor  advantage  protocol  designers  obtain  by  intuitively  reasoning  about  processors’ 
knowledge  in  a  distributed  system  is  the  fact  that  personal  experience  from  everyday 
life  can  be  used  to  facilitate  the  design  of  distributed  protocols.  This  thesis  suggests 
that  it  is  worthwhile  to  carry  this  approach  one  step  further,  and  ascribe  knowledge 
to  processors  in  a  precise  and  formal  way.  This  is  useful  because  there  are  many 
subtleties  involved  in  distinguishing  relevant  states  of  knowledge  that  may  initially 
seem  similar  but  are  in  fact  very  different.  Understanding  what  the  relevant  states 
of  knowledge  are  and  how  they  relate  to  one  another  is  a  major  problem  for  future 
work.  We  have  shown  a  close  relationship  between  certain  states  of  knowledge  and 
the  ability  to  perform  coordinated  actions  of  various  kinds.  In  particular,  the  in¬ 
ability  to  perform  certain  types  of  actions  under  particular  conditions  was  captured 
in  a  rather  general  way  by  our  negative  results  in  Chapters  3,  4,  and  6  regarding  the 
inability  to  attain  the  related  states  of  knowledge  in  those  circumstances.  Thus,  it 
seems  that  reasoning  about  the  attainability  of  states  of  knowledge  under  various 
conditions  may  be  a  good  way  to  study  the  properties  of  sustems  of  various  kinds. 

Whereas  our  treatment  dealt  mainly  with  a  rather  general  model  of  distributed 
systems,  it  seems  desirable  and  promising  to  investigate  particular  types  of  dis¬ 
tributed  systems  that  are  of  interest  from  the  point  of  view  of  the  states  of  knowl¬ 
edge  that  runs  of  various  protocols  attain  in  such  a  system.  A  good  example  of  this 
is  the  work  of  Chandy  and  Misra  in  [ChM],  in  which  they  capture  some  essential 
properties  of  totally  asynchronous  systems  by  performing  a  knowledge- based  anal¬ 
ysis  of  such  systems.  Our  analysis  in  Chapter  6  also  provides  a  general  setting  and 
some  new  insights  into  the  properties  of  a  particular  model  of  systems  of  unreliable 
processors.  Much  more  work  in  this  direction  needs  to  be  done. 

Using  the  language  of  knowledge  to  specify,  present,  and  perhaps  synthesize 
distributed  protocols  seems  attractive  in  a  number  of  ways.  First,  it  is  often  an  in¬ 
tuitive  way  to  think  anout  the  protocols.  Furthermore,  it  may  be  a  more  unified  and 
portable  way  of  communicating  the  protocol.  The  work  of  Afrati,  Papadimitriou, 
and  Papageorgiou  in  [APP]  shows  a  case  in  which  it  is  not  clear  how  to  specify  the 
goals  of  a  protocols  other  than  in  terms  of  attaining  a  particular  state  of  knowledge, 
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and  Halpern  and  Fagin  in  [HFj  define  and  discuss  knowledge-based  protocols  in  a 
formal  way.  This  is  another  area  requiring  further  work. 

In  Chapter  2  we  presented  a  very  general  framework  for  ascribing  knowledge 
to  processors  in  a  distributed  system.  However,  in  later  chapters  we  used  only 
state-based  interpretations  in  which  processors  axe  ascribed  a  great  deal  of  knowl¬ 
edge  without  being  required  to  perform  any  kind  of  computation  to  attain  it.  The 
reason  for  that  was  that  we  were  working  on  a  rather  coarse  level  of  investigation, 
looking  at  problems  in  which  the  computational  complexity  was  negligible,  and  the 
information-theoretic  aspects  played  a  major  role.  However,  for  many  applications 
in  fields  such  as  AI  and  cryptography  the  computational  problems  involved  are  of 
major  importance.  In  such  circumstances  it  is  essential  to  develop  a  good  theory 
for  computationally-based  knowledge.  We  regard  this  sis  one  of  the  harder  problems 
in  the  area,  and  one  in  which  progress  is  urgently  necessary.  Fagin  and  Halpern  in 
[FH]  take  a  significant  first  step  towards  such  a  theory  for  AI  applications.  Gold- 
wasser,  Micali,  and  Rackoff  in  [GMR]  present  a  theory  of  Knowledge  complexity , 
that  promises  to  be  a  good  starting  point  for  developing  a  computationally-based 
theory  of  knowledge  for  cryptography. 

In  summary,  the  study  of  knowledge  in  distributed  environments  is  still  in  its 
infancy,  but  promises  to  provide  insights  into  many  aspects  of  distrbuted  environ¬ 
ments.  We  have  shown  that  it  is  possible  to  make  reasoning  about  knowledge  in 
a  distributed  environment  precise,  and  used  such  reasoning  to  obtain  new  insight 
into  some  well-known  problems.  I  would  like  to  close  this  thesis  with  two  quotes. 
The  first  —  a  fortune  cookie  that  I  got  while  working  with  Cynthia  Dwork  on  the 
material  of  Chapter  6: 

“Imagination  is  more  important  than  knowledge” , 
and  the  second  a  quote  from  Marc  Chagall  at  age  90: 

“All  I  know  is  that  one  understands  only  what  one  loves”. 

Of  course,  these  sayings  suggest  further  inspiring  directions  for  future  work... 
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Systems  of  Modal  Logic 


This  appendix  contains  a  brief  summary  of  the  axioms  of  the  modal  systems  S4 
and  S5,  refered  to  in  the  thesis.  For  a  more  detailed  exposition  of  these  systems  and 
their  role  in  modal  logics  of  knowledge,  see  [HM].  We  use  M  for  a  generic  modal 
operator  (corresponding  to  A,,  CG,  IG ,  etc.). 

The  system  S5  consists  of  the  axioms: 


Al.  M(<p  D  ip)  D  (M<p  D  Mip) 

A2.  M<p  D  <p 
A3.  M^p  D  MM<p 
A4.  -i M<p  D 

And  the  rules  of  inference: 

Rl.  From  h  <p  and  h  (<p  D  tp)  infer  rp 
R2.  From  h  <p  infer  h  M<p 


(Consequence  Closure) 
(Knowledge) 
(Positive  Introspection) 
(Negative  Introspection) 

(Modus  Ponens) 
(Generalization) 


Under  the  total  view  interpretation  of  knowledge,  AT,-,  Ca,  and  IG  all  satisfy 
the  axioms  of  S5.  In  systems  of  unreliable  processors  (the  case  of  Chapter  6),  the 
common  knowledge  operator  C  corresponding  to  “common  knowledge  among  the 
active  processors”  also  satisfies  the  axioms  of  S5. 

The  system  consisting  of  axioms  A1-A3  and  inference  rules  Rl  and  R2  is  called 
S4.  The  implicit  knowledge  operator  of  Chapter  6,  corresponding  to  “implicit  knowl¬ 
edge  among  the  active  processors”  satisfies  the  axioms  of  S4,  but  does  not  satisfy 
the  axiom  A4,  and  hence  does  not  satisfy  S5. 


97 


Appendix  B 


A  Logic  With  Fixpoint  Definitions 

> 


In  this  appendix  we  present  a  logic  with  greatest  fixpoint  definitions  and  illus¬ 
trate  how  common  knowledge  and  variants  of  common  knowledge  can  be  formally 
defined  as  greatest  fixpoints.  Our  presentation  follows  Kozen’s  in  [Koz].  For  other 
treatments  of  modal  logics  with  fixpoint  constructions,  we  refer  the  reader  to  [Koz] 
and  [Fit]. 

Before  we  define  the  logic,  we  need  to  review  a  number  of  relevant  facts  about 
fixpoints.  Given  a  set  S,  we  will  be  considering  operators  /  mapping  subsets  of  S 
to  subsets  of  S.  A  subset  A  of  5  is  said  to  be  a  fixpoint  of  /  if  /(A)  =  A.  A  greatest 
(respectively,  least)  fixpoint  of  /  is  a  set  B  that  is  a  fixpoint  of  /  and  that  for  all 
fixpoints  A  of  f  satisfies  A  C  B  (resp.  B  C  A).  It  follows  that  if  /  has  a  greatest 
fixpoint  B,  then  B  =  (J{A  :  f(A)  —  A}.  The  operator  /  is  said  to  be  monotone 
if  /(A)  C  f(B)  whenever  A  C  B.  The  Knaster- Tarski  theorem  (cf.  [T])  implies 
that  a  monotone  operator  has  a  greatest  (and  a  least)  fixpoint.  Given  an  operator 
/  and  a  subset  A,  define  /°(A)  =  A  and  /,+1(A)  =  /(/’(A)).  /  is  said  to  be 
continuous  if  f(\J,  A,)  =  [J,  /(A,)  for  all  sequences  Ax,  A2, . . .  .  Given  a  monotone 
and  continuous  operator  /,  the  greatest  fixpoint  of  /  is  the  set 

n  ns)- 

n<w 

Similarly,  the  least  fixpoint  of  /  is  (J  /n(0). 

n  <u> 

We  can  now  define  a  state-based  propositional  logic  of  knowledge  for  a  given 
distributed  system,  with  temporal  operators  and  a  greatest  fixpoint  definition  con¬ 
struct. 


B.l  Syntax 

The  primitive  nonlogical  symbols  of  our  language  consist  of  a  set  $  =  {P,  Q, . . .} 
of  primitive  propositions,  a  single  distinguished  propositional  variable  Z,  and  aux¬ 
iliary  propositional  variables  X,  Y,  X\ ,  Fi , . . ..  Formulas  of  the  language  are  defined 
inductively  by: 
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a)  Z  is  a  formula. 

b)  P  is  a  formula  for  all  primitive  propositions  P  6  $. 

c)  -up  is  a  formula  if  tp  is  a  formula. 

d)  p  A  ip  is  a  formula  if  both  <p  and  ip  are  formulas. 

e)  Q)8p  is  a  formula  if  y?  is  a  formula  and  6  is  a  real  number. 

f)  Ov7  is  a  formula  if  is  a  formula. 

g)  Kup  is  a  formula  if  p  is  a  formula  and  i  £  {1, . . . ,  n}. 

h)  vX.p\ZJX\  is  a  formula  if  p[Z\  is  a  formula  in  which  sill  occurrences  of  Z  are 
positive  (i.e.,  are  within  the  scope  of  an  even  number  of  negation  signs  ->),  X  is 
an  auxiliary  variable  that  does  not  occur  in  p[Z],  and  p[Z/X]  is  the  result  of 
replacing  all  occurrences  of  Z  in  p[Z]  by  X. 

B.2  Semantics 

Given  a  distributed  system  represented  by  its  set  of  runs  5,  let  S  =  Sx(—o o,  oo). 
A  model  At  is  a  triple  (S,  ir,  a),  where  S  is  as  above,  ir  ;  5  x  $  — ►  {true,  false}  is 
an  assignment  of  truth  values  to  the  primitive  propositions  at  the  points  of  5,  and 
a  :  {l,...,n}x«S— »  E  is  an  assignment  of  the  states  (from  a  set  of  states  S)  to  the 
processors  at  the  points  of  S.  Formally,  a  formula  p  in  our  language  is  interpreted 
as  an  operator  pM  from  subsets  of  S  to  subsets  of  S.  The  operator  pM  is  defined 
inductively  as  follows: 

a)  ZM(A)  =  A. 

b)  pM(A)  =  {s  £  S  :  ir(s,V)  =  true}. 

c)  {^p)"{A)  =  S-p"(A). 

d)  (p  A  ip)M(A)  =  <pM(A)  D  ipM(A). 

e)  (OV)*1^)  =  {«  £  A  :  s  =  (r,  t)  and  (r,<  +  8)  £  pM(A)}. 

n  (ovr(A)  =  u  (ovr  w 

S>  0 

g)  ( Knp)M{A )  =  {s  £  A  :  for  all  s'  £  S,  cr(i,s)  =  a {i,s')  implies  s'  €  pM(A)}. 

h)  (vXv[ZIX)Y*{A)  =  U{5  :  pM(B)  =  B}. 

We  define  f=  ip  if  (r, t)  £  pM(ip).  It  can  be  checked  that  for  formulas 

p  in  which  no  variables  or  greatest  fixpoint  operators  appear,  this  definition  is 
consistent  with  our  previous  definition.  I.e.,  Af,(r,  t)  |=  p  iff  t)  |=  p. 
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We  can  extend  the  syntax  of  our  language  by  defining  V,  D,  and  =  in  terms  of  A 
and  -i,  defining  Ep  (“everyone  knows  p”)  and  E°p  (“everyone  will  eventually  know 
p”)  as  Kip  A  K^p  A  •  •  •  A  Kmp  and  QK\p  A  §KiP  A  •  •  •  A  §Kmp  respectively,  and 
defining  the  least  fixpoint  construct  pX.p[Z  /  X]  as  ->vX.-i p[Z/-*X\.  (Notice  that 
if  all  occurrences  of  Z  in  p[Z]  are  positive,  then  all  occurrences  of  X  in  -*p[Z  /  ~<X\ 
are  also  positive.) 

We  state  without  proof  the  following  facts  (cf.  [Koz]): 

(i)  If  all  occurrences  of  Z  in  p[Z]  Eire  positive  then  (p\Z\)M  is  a  monotone  operator. 
By  the  Knaster-Tarski  theorem,  this  implies  that  vX.p\ZjX\  is  well  defined. 

(ii)  If  all  occurrences  of  Z  in  p[Z\  are  positive  then  (p[Z])M  is  also  continuous,  and 
thus 

(vX.f{z/x\r  =  fl  (M2TW 

n<b> 

(iii)  A  formula  in  which  Z  does  not  appear  is  called  closed.  If  p  is  a  closed  formula 
then  p**  is  a  constant  operator,  i.e.,  there  is  a  set  B  C  S  such  that  p**(A)  =  B 
for  all  subsets  A  of  S.  (In  this  case  we  denote  pM(A)  by  pM.)  In  particular, 
(true)**  =  S  and  (false)A<  =  0. 

Given  the  machinery  at  our  disposal,  we  can  now  define  Cp  as  uX.(p  A  EX), 
define  C(p  as  vX.(p  A  Qf  EX),  and  define  C^p  as  vX.(p  A  E°X).  By  continuity, 
we  get  that 

Cp  =  p  A  (p  A  Ep )  A  (p  A  Ep  A  EEp)  A  .... 

The  analogous  fact  holds  for  Cfp  and  C°p.  Notice  that  in  the  case  of  Cp  and  C'p 
we  have  that 

Cp  =  p  A  Ep  A  EEp  A  ...,  and  that 
Cfp  =  p  A  (OfE)p  A  (O fE)2p  A.... 

However,  C°p  is  not  equivalent  to  pAE°pA(E<f)2pA. . ..  We  encourage  the  reader 
to  check  that  Cp,  C(p,  and  C°p  indeed  satisfy  the  axioms  (l)-(3)  of  Section  4.1, 
and  that  Cp  satisfies  ->Cp  D  C~*Cp. 

It  is  straightforward  to  extend  the  above  framework  to  include  reference  to  sets 
G  of  processors,  in  order  to  define  variants  of  common  knowledge  parameterized  by 
G.  It  is  also  possible  to  extend  it  to  include  explicit  individual  clock  times  in  order 
to  define  CTp,  and  to  add  likelihood,  probability,  etc.,  in  order  to  define  all  of  the 
other  variants  of  common  knowledge  introduced  in  the  thesis. 
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